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PREFACE 


This  volume  comprises  the  Proceedings  of  the  Symposium  on 
Mathematical  Pattern  Recognition  and  Image  Analysis  (MPRIA)  held  June  1-3, 
1983,  at  the  NASA/Johnson  Space  Center,  Houston,  Texas. 

The  Symposium  was  initiated  with  a brief  Program  Overview  presented 
by  Drs.  Howard  G.  Hogg,  NASA  Headquarters,  and  R.  P.  Heydorn,  NASA/JSC. 

The  first  paper  appearing  in  the  Proceedings  was  prepared  by 
Professor  Robert  M.  Haralick  in  support  of  his  excellent  invited  keynote 
address.  The  remaining  eighteen  papers  of  the  Proceedings  present  the 
results  of  various  research  efforts  initiated  during  FY  1982  as  part  of 
NASA's  Remote  Sensing  Research  Program.  Five  of  the  papers  present 
results  from  the  four  research  efforts  carried  out  by  the  following  NASA 
principal  investigators: 

R.  P.  Heydorn  - NASA/Johnson  Space  Center 
David  D.  Dow  - National  Space  Technology  Laboratories 
Manouher  Naraghi  - Jet  Propulsion  Laboratory 
Daniel  N.  Held  - Jet  Propulsion  Laboratory 
The  remaining  thirteen  papers  present  results  from  the  eleven  research 
efforts  initiated  July  16,  1982,  under  Contract  NAS  9-16664  and  carried 
out  by  the  following  principal  investigators: 

H.  P.  Decell,  Jr./B.  C.  Peters,  Jr.  - University  of  Houston 
Carl  Morris  - University  of  Texas  at  Austin 
L.  Schumaker/L.  F.  Guseman,  Jr.  - Texas  A&M  University 
K.  S.  Shanmugan  - University  of  Kansas 
E.  Parzen/W.  B.  Smith  - Texas  A&M  University 
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A.  H.  Strahler  - Hunter  College 

Waldo  Tobler  - University  of  California,  Santa  Barbara 
E.  M.  Mikhail  - Purdue  University 
Grahame  Smith  - SRI  International 
L.  Kanal  - LNK  Corporation 

L.  S.  Davis/A.  Rosenfeld  - University  of  Maryland 
In  an  attempt  to  group  presentations  of  a similar  nature,  the 
Symposium  was  divided  into  three  MATH/STAT  SESSIONS  and  two  PATTERN 
RECOGNITION  sessions.  This  grouping  also  reflects  the  topical  contents  of 
the  MPRIA  Technical  Workshops  on  MATH/STAT  and  PATTERN  RECOGNITION  held 
January  27-28,  1983  and  February  3-4,  1983,  respectively. 

The  papers  appear  in  the  Proceedings  in  the  order  in  which  they  were 
presented  at  the  Symposium.  An  agenda  and  a list  of  attendees  who 
registered  for  the  Symposium  are  included  in  the  Appendix. 

L.  F.  Guseman,  Jr. 

Principal  Investigator  and 
MPRIA  Program  Coordinator 
Contract  NAS  9-16664 
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KEYNOTE  ADDRESS 


RELATIVE  ELEVATION  DETERMINATION  FROM  LANDSAT  IMAGERY 


R.M.  Haralick,  S.  Wang 


Dept,  of  Electrical  Engineering  and  Computer  Science 


Virginia  Polytechnic  Institute  and  State  University 


ABSTRACT  - In  LANDSAT  imagery,  spectral  and  spatial  informa- 
tion can  be  used  to  detect  the  drainage  network  as  well  as 
the  relative  elevation  model  in  mountainous  terrain.  To  do 
this,  the  mixed  information  of  material  reflectance  and  to- 
pographic modulation  in  the  original  LANDSAT  imagery  must  be 
first  separated.  From  the  material  reflectance  information, 
big  visible  rivers  can  be  detected.  From  the  topographic 
modulation  information,  ridges  and  valleys  can  be  detected 
and  assigned  relative  elevations.  Finally,  a relative  ele- 
vation model  can  be  generated  by  interpolating  values  for 
non-ridge  and  non-valley  pixels. 
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1 . jt  n_t  r o d u c.  t.  j.  o n 

It  is  a common  task  for  a photointerpreter  to  examine 
the  spatial  pattern  on  an  aerial  image  and  by  appropriate 
interpretation  be  able  to  tell  the  elevation  of  one  area  re- 
lative to  another  and  be  able  to  infer  the  stream  network 
and  the  drainage  network  even  though  some  of  the  streams  may 
be  below  the  resolution  of  the  sensor.  There  is  a wealth  of 
information  in  spatial  patterns  on  aerial  imagery  but  most 
computer  data  processing  of  remotely  sensed  imagery,  being 
limited  to  pixel  spectral  characteristics,  does  not  make  use 
of  it. 

In  this  paper,  we  describe  a procedure  by  which  a rela- 
tive elevation  model  can  be  infered  from  a LANDSAT  scene  of 
mountainous  and  hilly  terrain.  The  processing  has  a number 
of  distinctly  different  steps.  First  to  appropriately  pre- 
pare the  imagery  for  processing  we  must  destripe  it  and  per- 
form haze  removal.  Dostriping  can  be  done  by  the  Horn  and 
Woodham  [1979]  technique.  Haze  removal  can  be  done  by  the 
Switzer,  Kowalik  and  Lyon  [1981]  technique.  These  two  steps 
constitute  the  preprocessing  and  are  not  discussed  in  this 
paper. 

To  a first  order  effect,  after  preprocessing  the  cause 
of  the  intensity  value  at  any  pixel  is  due  to  the  combined 
effect  of  the  angle  at  which  the  sun  illuminates  the  ground 
patch  corresponding  to  the  pixel  and  the  reflectance  of  the 
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surface  material  on  the  ground  patch.  To  make  sense  of  the 
spatial  pattern  first  requires  separating  these  two  effects. 
For  this  purpose  we  modify  the  Eliason,  Soderblom  and  Chavez 
[1981]  technique  to  create  two  main  images,  from  the  LANDS AT 
imagery.  The  first  image  is  a reflectance  image  and  the 
second  image  is  a topographic  modulation  image  which  has  in- 
formation related  to  surface  slope  and  sun  illumination. 
The  details  of  this  technique  are  given  in  Section  2. 

As  discussed  in  Section  3,  the  reflectance  image  can  be 
used  by  the  Alfoldi  and  Munday  [1978]  procedure  for  identi- 
fication of  all  areas  of  water.  The  topographic  modulation 
image  can  be  used  to  identify  the  ridges  and  the  valleys. 
This  is  discussed  in  Section  4.  With  the  valleys  identi- 
fied, each  valley  pixel  may  be  assigned  a relative  elevation 
which  increases  as  the  valley  path  from  the  pixel  to  the 
river  it  empties  in'  increases.  Ridges  must  be  assigned  ele- 
vations higher  than  their  neighboring  valleys  and  each  ridge 
pixel  can  be  assigned  a relative  elevation  which  decreases 
on  the  ridge  path  from  the  pixel  to  the  saddle  point  where 
the  ridge  crosses  a valley.  The  ridge  valley  elevation  as- 
signment procedure  is  discussed  in  Section  5.  Once  ridges 
and  valleys  have  been  located  and  assigned  relative  eleva- 
tions, a complete  elevation  model  can  be  generated  by  inter- 
polating values  for  non-ridge  and  non-valley  pixels, 
interpolation  procedures  are  discussed  in  Section  6. 


The 
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Since  the  launch  of  the  first  Earth  Resources  and  Tech- 
nology Satellite  (ERTS,  later  renamed  LANDSAT)  in  July  1972, 
much  work  in  remote  sensing  has  been  done  by  using  pattern 
analysis  and  picture  processing  techniques  for  image  classi- 
fication, restoration  and  enhancement.  Few  people  have 

tried  the  scene  analysis  or  artificial  intelligence  approach 
to  describe  the  image  in  terms  of  the  properties  of  objects 
or  regions  in  the  image  and  the  relationships  between  them. 
Ehrich  [1977]  found  global  lineaments  by  partitioning  the 
image  into  windows  and  applying  long,  straight  linear  fil- 
ters at  different  orientations  in  each  window  to  extract  lo- 
cal evidence.  Dynamic  programming  [Montanari,  1971;  Martel- 
li,  1972]  was  then  used  to  form  complete  global  lineaments. 
VanderBrug  [1976]  tested  various  detectors  to  get  linear 
features  in  satellite  imagexy.  This  was  only  at  the  local 
level.  Later  VanderBrug  [1977a]  used  relaxation  to  reduce 
noise  in  the  output.  Finally  VanderBrug  [1977b]  defined  a 
merit  function  that  can  be  used  to  select  pairs  of  segments 
to  be  merged  so  that  local  line  detector  responses  can  be 
linked  together  into  a global  representation  of  the  curves  . 
His  work  is  closely  related  to  the  Shirai  [1973]  technique 
which  employed  sequential  line  following  to  find  edges  in 
scenes  containing  polyhedra.  Li  and  Fu  [1976]  used  tree 
grammars  to  locate  highways  and  rivers  from  LANDSAT  pic- 
tures. The  above  investigations  deal  with  the  extraction  of 
all  tho  linear  features  from  an  image,  but  they  do  not  deal 
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with  the  interpretation  of  these  linear  features.  In  the 
following  investigations,  knowledge  about  the  desired  fea- 
tures are  considered  crucial  in  such  analyses. 

Bajcsy  and  Tavakoli  [1975]  argued  that  an  image  filter 
is  not  meaningful  unless  one  has  a world  model,  a descrip- 
tion of  the  world  one  is  dealing  with.  They  recognized  ob- 
jects matching  this  description  and  filtered  them  out.  This 
strategy  is  used  to  sequence  the  recognition  of  bridges, 
rivers,  lakes,  and  islands  from  satellite  pictures.  Nagao 
and  Matsuyama  [1980]  built  an  image  understanding  system 
that  automatically  located  a variety  of  objects  in  an  aerial 
photograph  by  using  diverse  knowledge  of  the  world.  It  is 
one  of  the  first  image  understanding  systems  that  has  incor- 
porated very  sophiscated  artificial  intelligence  techniques 
into  the  analysis  of  complex  aerial  photographs.  Fischler, 
Tenenbaum  and  Wolf  [1981]  designed  a 1 ow- r e s o 1 u t i on  road 
tracking  (LRRT)  algorithm  for  aerial  imagery.  The  approach 
was  based  on  a new  paradigm  for  combining  local  information 
from  multiple  sources,  map  knowledge,  and  generic  knowledge 
about  roads.  The  final  interpretation  of  the  scene  was  ac- 
hieved by  using  either  graph  search  or  dynamic  programming. 

Similarly,  knowledge  is  important  in  our  problem  which 
requires  analysis  both  at  the  local  and  global  levels.  Lo- 
cal level  analysis  will  be  discussed  in  Section  2 to  4]  glo- 
bal level  analysis  will  be  discussed  in  Section  5 to  6. 
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2. • .1  JULjisiii i on  sod.®..! 

The  brightness  and  darkness  in  each  band  of  LANDSAT  im- 
ages come  from  two  main  sources.  First,  they  can  be  due  to 
material  properties.  For  example,  in  the  spectral  region 
(.8  - 1.1  pm)  of  band  7,  water  bodies  absorb  infrared  radia- 
tion, so  they  appear  as  clearly  delineated  dark  bodies! 
living  vegetation  reflects  strongly  in  this  portion  of  the 
infrared,  so  areas  of  living  green  vegetation  appear  as 
bright  regions.  Second,  they  may  be  due  to  topography  and 
sun  illumination  angle  effects.  The  mountain  side  facing  to 
the  sun  appears  as  a bright  region!  the  mountain  side  facing 
away  from  the  sun  may  appear  as  a shadow  or  dark  region. 
Unfortunately,  the  LANDSAT  data  values  are  some  combination 
of  these  two  effects.  Eliason,  Soderblom,  and  Chavez  [1981] 
address  this  problem  by  defining  an  illumination  model  in- 
volving material  reflectance  and  topographic  modulation  im- 
ages. In  the  following,  we  will  introduce  a modified  Lam- 
bertian model  in  which  the  information  of  diffuse  light  and 
shadows  is  also  included. 

For  a pixel  (x,y)  which  receives  sunlight,  the  original 
LANDSAT  image  G measuring  the  amount  of  reflected  light  at 
ba  nd  b i s 


G ( x , y , b ) 


r(x,y,b)I(b)  cosO(x,y)  + r(x,y,b)D(b)  + 11(b) 
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n 


where  r is  the  surface  reflectance,  I is  illumination  flux, 
0 is  the  angle  between  sun  incidence  direction  and  surface 
normal,  D is  diffuse  light,  and  II  is  the  haze  due  to  atmos- 
pheric scattering.  On  the  other  hand,  for  a pixel  (x,y)  in 
shadow,  G is  simply 

G ( x , y , b ) = r(x,y,b)  D(b)  + H(b) 

After  the  haze  H(b)  is  removed  by  the  Switzer,  Kowalick 
and  Lyon  [1981]  technique,  for  pixels  receiving  sunlight, 
the  ratio  image  of  bands  b^  and  b2  is 

G'(x,y,b1)  G(x,y,b1)  - H(b1) 

G' (x,y,b2)  G ( x , y , b 2 ) - H(b2) 

'(x.y.bj)  [I(b1)cos9(x,y)  + DCb.^] 

r(x,y,b2)  [I(b2)cos0(x,y)  + D(b2>] 

r ( x , y , b 2 ) 

= a 

r ( x , y , b 2 ) 

if  we  assume  I(b^)  = al(b2)  and  D(b^)  = aD(b2). 

Similarly,  for  pixels  in  shadows, 

G ' ( x , y , b ^ ) r ( x , y , b ^ ) 

= a 

G ' ( x , y , b 2 r ( x , y , b 2 ) 

In  either  case,  the  ratio  is  independent  of  cosO.  Thus, 
by  clustering  using  different,  ratio  images  as  features,  the 
pixels  grouped  in  one  cluster  should  belong  to  the  same  ma- 
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terial  reflectance  group.  The  result  is  called  a reflec- 
tance cluster  image. 


igure 


images 
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C(cl)  = { ( x , y ) I R (x,y)  = cl},  1 <.  cl  <.N 

c c 

The  pixels  in  each  C(cl)  do  not  have  identical  gray  tone 
intensities  in  the  de hazed  G’  image.  This  is  due  to  the 

fact  that  some  pixels  are  directly  lit  and  others  are  in 
shadow.  By  performing  a second  clustering  on  G'  within  each 
C(cl),  we  can  split  each  C(cl)  into  a bright  sub-cluster 
C (cl)  consisting  of  directly  lit  pixels  and  a dark  sub- 
cluster C j ( c 1 ) consisting  of  pixels  in  shadow.  A binary 

shadow  image  Sw  can  be  defined  by 

Sw:  X x Y ->  {0,1}  , 

Sw  ( x , y ) = p if  ( x , y ) e CQ(Rc(x,y)) 

VI  if  (x,y)  e C.(R  (x,y)). 

J.  c 

This  is  shown  in  Figure  4. 


Figure  4 - Binary  shadow  image. 
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After  the  lit  and  shadowed  pixels  are  identified,  we  ex- 
tract a diffuse  light  image  which  contains  in  each  pixel 
(x,y)  the  value  r ( x , y , b ) D ( b ) , a reflectance  image  R which 
contains  in  each  pixel  (x,y)  the  value  r(x,y,b)I(b),  and  a 
topographic  modulation  image  which  contains  in  each  pixel 
(x,y)  the  value  cos9(x,y).  Thus,  for  directly  lit  pixels 


G'(x,y,b)  = R ( x , y , b ) Tp(x,y)  + D^(x,y,b),  (*) 

and  for  shadowed  pixels 

G'(x,y,b)  = D ^ ( x , y , b ) 


Since  shadowed  pixels  contain  the  information  of  diffuse 
light  only,  the  mean  dehazed  G'  value  of  pixels  in  C (cl) 
can  be  used  to  represent  the  reflected  diffuse  light  infor- 
mation for  cluster  cl.  Th,e  diffuse  lit  image  is  defined 
by 


D f ( x , y , b ) = 

(u, v)  e C1(c 

where  cl  = R (x,y).  If  the 
c 

perfect,  we  would  have 

A ssbbe  _t  _i  on  1 : r(x,y,b)  is  a 

in  C(cl)  with  cl  = R (x,y). 

c 


G' (u, v) 

U c (cl) 

1) 

reflectance  cluster  image 
constant  r(cl,b)  for  all 


were 


(x,y) 


In  this  case. 


D^(x,y,b)  = r(cl,b)D(b) 


= r ( c 1 , b ) D (b  ) 


z 

C1(cl  ) 


1 

it  C 1 ( c 1 ) 
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Since  directly  lit  pixels  contain  the  information  of 

diffuse  light  as  well  as  direct  sun  illumination,  the  mean 

G'  - value  of  pixels  in  Cq(c1)  can  be  used  to  represent 

the  reflected  sun  illumination  information  for  cluster  cl. 

If  pixel  (x,y)  is  in  reflectance  cluster  cl,  that  is,  if 

R (x,y)  = cl,  then  the  reflectance  image  R can  be  defined  by 
c 


R ( x , y , b ) = 


L 


( u , v ) e C q ( c 1 ) 


G ' ( u , v , b ) - Df (u, v,b) 
# Cq ( c 1 ) 


= r ( c 1 , b ) 1(b) 


r 

C0(cl) 


C O S0 ( U , V ) 


# C0(cl) 


- r(c.l,b)  1(b)  X (Cl) 

c 

where  X is  the  spatial  average  of  cos0  for  pixels  in 
c 

Cq(c1).  It  is  meaningful  to  look  at  R image  only  if  we  make 
the  following  assumption. 


As  sump  t i on  2 : 


X (cl)  takes  the  same  value  X for  all  re- 


flectance clusters 


Tp(x,y)  = 


Finally,  from  equation  (*), 

G ' ( x , y ,b ) - Df(x,y,b) 

R( x , y , b ) 
cos0(x,y) 

X 

c 

which  contains  the  information  about  the  cosine  of  the  angle 

between  the  surface  normal  and  the  illumination  direction. 

The  D_,  R,  and  T images  for  Figure  1 are  shown  in  Figure 
f p 


5,6,  and  7 . 
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3_ . Detection  of  Visible  livery 

Visible  river  detection  can  play  an  important  part  in 
generating  an  elevation  model  since  elevations  away  from  the 
river  must  increase.  Visible  rivers  can  be  detected  using 
the  material  reflectance  image  created  by  the  technique  dis- 
cussed in  the  last  section.  In  the  spectral  region  (.8  - 
1.1  pm)  of  band  7,  water  bodies  absorb  infrared  radiation, 
so  visible  rivers  appear  as  dark  curves,  and  lakes  appear  as 
dark  regions.  In  the  material  reflectance  image  of  band  7, 
these  dark  features  become  more  clear  because  shadows  are 
removed.  However,  not  all  dark  features  are  water  bodies; 
the  real  water  bodies  can  be  identified  by  the  following 
process  [Alfoldi  and  Munday,  1978]. 

(1)  A band  4 green  coefficient  x of  every  pixel  is  cal- 
culated as  the  ratio  of  the  radiance  of  band  4 over  the  ra- 
diance sum  of  bands  4,  5 and  6.  Similarly  a band  5 red 
coefficient  y is  calculated  for  every  pixel.  X and  y are 
called  LANDSAT  chromaticity  coordinates. 

(2)  In  this  coordinate  system,  Munday  [1974]  has  deter- 
mined a curve  (Figure  8)  which  is  the  locus  of  the  positions 
of  chromaticity  values  of  water  bodies.  If,  for  some  pix- 
els, the  x,  y values  calculated  in  1 are  close  to  this 
curve,  then  those  pixels  can  be  identified  as  portions  of 


water  bodies 
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4 . Ridge-Valley  Detection 

In  this  section,  we  describe  how  to  extract  shadowed  and 
bright  areas,  create  linear  features  on  the  borders  between 
those  areas,  and  then  classify  these  linear  features  into 
ridge  and  valley  segments.  In  the  next  two  sections,  we 
discuss  how  to  generate  a relative  elevation  model. 

From  the  shadow  image  of  Figure  4,  we  can  get  the  con- 
nected components  of  bright  and  shadowed  regions.  Because 
valleys  and  ridges  exist  on  the  borders  between  these  re- 
gions, the  perimeters  of  these  bright  and  shadowed  regions 
are  segmented  into  border  segments  according  to  their  left 
regions,  right  regions,  and,  orientations.  A border  segment 
is  a maximally  long  sequence  of  connected  pixels  which  are 
on  the  border  between  two  given  regions.  Because  the  detec- 
tion of  ridges  and  valleys  is  highly  orientation-dependent 
and  the  sun  illumination  comes  from  east  in  Figure  1,  each 
border  segment  is  further  broken  into  several  pieces  accord- 
ing to  orientation:  all  the  east-west  parts  can  be  separat- 
ed from  the  north-south  parts.  The  result  is  shown  in  Fig- 
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As  the  sun  illumination  comes  from  east,  those  border 
segments  which  are  valley  segments  or  ridge  segments  can  be 
identified  according  to  the  brightness  of  their  left  and 
right  regions.  Because  most  of  the  trees  in  this  area  in 
April  are  unfoliated,  the  strongest  region  boundaries  are 
shadow  boundaries  rather  than  tonal  boundaries,  and  the 
strongest  boundaries  are  those  at  the  extremes  of  steep 
slopes  oriented  normal  to  the  sun  direction.  Because  the 
sun  illumination  is  predominantly  east-west,  a boundary  that 
is  dark  on  the  left  and  bright  on  the  right  will  correspond 
to  a ridge,  and  the  reverse  will  correspond  to  a valley. 

East-west  region  boundaries  are  classified  according  to 
the  labeling  of  neighboring  north-south  boundaries  as  well 
as  their  orientation  relative  to  the  east-west  boundaries. 
As  shown  in  Figure  10,  east-west  boundaries  have  the  same 
labeling  of  the  north-south  boundary  which  makes  the  angle 
between  them  larger.  The  results  of  ridge-valley  finding 
are  shown  in  Figure  11.  Assignment  of  relative  elevation  to 
ridge  and  valley  is  discussed  in  the  next  section. 
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5. • Ro.iJLLi.viL  Si®,  vallons.  <±L  R_id.ge.  and  Va  AA.  jay.  Segments 

In  this  section,  we  describe  how  to  estimate  the  rela- 
tive elevations  of  the  ridges  and  valleys.  First  we  will 
describe  a model  which  can  do  the  elevation  assignment  job, 
then  we  will  give  the  equations  of  elevation  assignment. 

Assuming  that  we  have  a stream  network  in  a mountainous 
area,  and  we  know  where  the  biggest  rivers  are,  we  can  trace 
the  network,  starting  from  the  biggest  rivers,  to  find  the 
flow  directions  of  all  the  stream  segments  because  water  al- 
ways flows  from  higher  locations  to  lower  locations.  In 
other  words,  if  the  valley  segments  detected  in  the  last 
section  formed  a network,  . then  starting  from  the  visible 
rivers  detected  in  Section  3,  we  can  trace  the  network  and 
assign  relative  elevations  to  all  the  segments.  Unfortu- 
nately, the  observed  valley  segments  do  not  form  a networkp 
there  are  many  gaps.  As  shown  in  Figure  12,  if  it  is  dark 
on  the  right  and  bright  on  the  left  of  stream  Vb , then  Vg 
cannot  be  detected  due  to  the  shadow  on  the  right  of  Vb,  and 
a gap  exists  between  Vb  and  a smaller  stream  Vs. 


28 


Figure  12  - The  gap  between  a smaller  and  a larger  stream. 

The  knowledge  that  the  cross-sections  of  valleys  are  V- 
shaped  can  be  used  to  bridge  the  gaps.  If  one  looks  at  to- 
pographic maps,  the  elevation  contours  of  valleys  such  as  in 
Figure  13  can  be  frequently  found.  Thus,  if  one  draws  a 


line  ab  perpendicular  to  the  valley  Va 


the  elevations  are 
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increasing  from  point  o to  point  a , and  also  from  point  o 
to  point  b.  However,  if  a ridge  point  is  encountered  during 
the  process,  the  increasing  has  to  stop  because  the  eleva- 
tion starts  to  decrease.  Thus  the  route  of  growth  is  di- 
rected both  by  the  valleys  and  by  the  ridges,  in  other 
words,  by  global  information. 


Figure  13  - The  elevation  pattern  of  valleys  and  its  rela- 
tion to  elevation  growing 

Applying  this  idea  to  Figure  12  and  assuming  that  growing 
propagates  away  from  valley  segment  Vb , the  end  a of  valley 
segment  Vs  will  be  touched  first  by  this  growing,  and  it  is 
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deduced  that  end  b of  Vs  must  be  higher  than  end  a.  This  is 
the  basic  idea  for  determining  the  higher-lower  ends  of  all 
the  valley  segments.  The  elevations  of  all  the  points  in 
one  segment  can  be  calculated  if  we  know  its  slope.  On  the 
other  hand,  ridges  get  elevations  when  the  growing  stops  at 
them.  Now,  we  will  give  the  simple  equations  of  elevation 
ass i gnme  nt . 


Our  elevation  growing  model  simply  assumes  that  eleva- 
tion increases  monotonically  from  valleys  to  ridges  or  along 
valley  segments  from  rivers  to  the  saddles  where  a valley 
crosses  a ridge.  It  can  be  used  for  assigning  initial  rela- 
tive elevations  to  each  pixel.  Because  no  attempt  is  made 
to  realistically  account  for  the  topographic  shape  of  the 
hillsides  from  the  valley  to  the  ridge,  the  initial  relative 
elevations  will  be  more  accurate  for  the  ridge  or  valley  la- 
beled pixels  than  the  non-ridge  and  non-valley  labeled  pix- 
els. Section  6 discusses  a more  realistic  procedure  for 
hillside  elevation  estimation  using  the  ridge  valley  eleva- 
tions calculated  in  this  section. 

There  are  two  ways  a pixel  can  get  assigned  an  elevation 
depending  on  whether  the  pixel  belongs  to  a valley  segment 
or  whether  the  pixel  does  not  belong  to  a valley  segment. 
Let  0 be  the  set  of  valley  segments.  Two  slopes  are  associ- 
ated with  each  valley  segment  Vs  in  U:  Sv(Vs)  and  Sp(Vs). 
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Sv(Vs)  is  the  slope  along  Vs  itself.  Sp(Vs)  is  the  slope  of 
lines  outside  of  Vs  and  perpendicular  to  Vs. 

The  elevation  growing  model  constructs  the  elevation 
function  El:  Zr  X Zc  ->  Ip,  where  Zr  is  the  set  of  row  coor- 
dinates,. Zc  is  the  set  of  column  coordinates,  and  Ip  is  the 
set  of  zero  and  positive  integers.  If  p is  a pixel  belonging 
to  a valley  segment  Vs  and  pi  is  the  lower  end  pixel  identi- 
fied as  in  Figure  12,  then 

El (p ) = El(pl)  + Sv(Vs)  * Dist(p.  pi) 
where  Dist  is  the  Euclidean  distance  between  two  pixels. 

If  p does  not  belong  to  any  valley  segment,  and  its  ele- 
vation is  originated  from  pixel  pr  of  valley  segment  Vs, 
then 

El(p)  = El(pr)  + Sp(Vs).  * Dist(p,  pr ) . 

In  a small  area,  one  can  assume  the  elevations  of  visi- 
ble rivers  are  lowest.  Assigning  some  initial  elevation  va- 
lues to  the  pixels  of  the  valley  segments  classified  as  vi- 
sible rivers,  the  elevations  of  all  the  other  pixels  in  the 
image  window  can  be  related  to  the  initial  elevations  of  vi- 
sible river  segments  by  repeatedly  using  the  above  two  equa- 
tions. The  relative  heights  of  valley  segments  created  by 
elevation  growing  model  are  indicated  by  arrows  in  Figure 
14,  and  the  ground  truth  is  shown  in  Figure  15. 
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5. . 1.  Identification  o_f  P eji  k June  t ions 

When  several  valleys  and  ridges  point  toward  a junction, 
very  often  this  juction  is  a peak  (peak  at  junction).  The 
peak  itself  is  formed  by  the  junction  of  several  ridges  that 
radiate  outward  from  the  peak.  (The  idealized  situation 
represented  in  Figure  16  shows  four  symetrically  oriented 
ridges;  in  our  area,  real  peaks  are  often  formed  by  junc- 
tions of  two  or  three  ridges.)  Ridges  of  course  are  sepa- 
rated by  valleys,  so  the  higher  tips  of  valley  segments  tend 
to  point  toward  peaks.  The  ridge  segments  intersect  to  form 
a peak,  whereas^  valley  segments  tend  to  point  towards  peaks, 
without  actually  joining.  In  this  subsection,  we  discuss 
the  criteria  which  can  be  used  to  identify  peak  junctions. 

Because  ridge  segments  are  the  major  features  of  peaks, 
we  make  the  constraint  that  the  number  of  ridge  segments  at 
a junction  is  larger  than  the  number  of  valley  segments. 
For  many  situations,  it  seems  reasonable  to  relate  the 
heights  of  peaks  to  the  lengths  of  ridges  that  form  the 
peaks.  For  our  class  of  topographic  forms  (for  example),  it 
is  unlikely  that  very  high  peaks  can  be  formed  by  the  inter- 
section of  very  short  ridges.  As  a result,  to  exclude  very 
low  peaks  and  false  peaks  from  consideration,  we  impose  a 
rather  arbitrary  constraint  upon  definitions  of  peaks.  Cur- 
rently, we  define  a peak  junction  as  a junction  composed  of 
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four  border  segments,  with  the  number  of  its  ridge  segments 
larger  than  the  number  of  valley  segments,  and  the  length  of 
its  longest  ridge  segment  longer  than  800  meters.  The  peaks 
thus  located  in  Figure  1 are  marked  as  triangles  in  Figure 
11. b.  The  correspondence  between  this  result  and  the  topo- 
graphical map  is  suprisingly  good. 


Figure  16  — Idealized  relationships  between  peaks,  valleys. 


ridges. 
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6_ . _I  £ _tj5  r j>  o l_ja,t  J.  n.g  Between  R JL  dg.  e_  .s  and  ViiJLl  e.gs. 

In  the  last  section  all  pixels  were  assigned  elevations, 
but  because  realistic  shape  of  the  hillsides  from  valleys  to 
ridges  were  not  taken  into  account,  only  the  relative  eleva- 
tions of  the  ridges  and  valleys  are  held  to  be  accurate.  In 
this  section  we  describe  a few  interpolation  procedures 
which  permit  more  realistic  elevation  assignment  to  non-val- 
ley and  non-ridge  pixels. 

The  first  interpolating  surface  has  the  given  elevation 
values  at  ridges  and  valleys  and  has  a 3 X 3 digital  Lapla- 
cian  of  zero  at  all  non-ridge  and  non-valley  pixels.  This 
will  be  referred  to  as  the  Laplacian  surface.  The  system  of 
linear  equations  which  this  constraint  gives  rise  to  can  be 
written  as 

A x = b . 

The  vector  x is  the  solution  and  represents  the  values  to  be 
assigned  to  each  'variable'  (non-ridge  non-valley)  pixel  in 
the  elevation  model.  The  A matrix  is  defined  by  applying 
the  digital  Laplacian  mask  operator  (Figure  17)  to  each  va- 
riable pixel.  A mask  operator  is  applied  to  a pixel  by 
placing  the  mask  over  the  image  so  that  the  central  (large 
positive)  mask  value  is  directly  over  the  pixel  whose  value 
is  to  be  computed.  The  pixel  value  is  changed  to  make  the 
sum  of  the  mask  values  times  the  corresponding  image  values 
under  them  equal  to  zero.  For  the  Laplacian  surface  only. 
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Neumann  boundary  conditions  are  enforced  along  the  outside 
rows  and  columns  of  the  elevation  model  image.  That  is,  the 
outer-most  row  or  column  is  repeated  so  that  the  mask  opera- 
tor can  be  applied  to  the  outside  pixels.  There  is  one  row 
in  A for  each  variable  pixel  in  the  elevation  model  and  one 
coefficient  value  in  that  row  for  each  variable,  A is  a 
sparse  matrix  since  no  variable  is  constrained  by  more,  than 
four  other  variables  (due  to  the  definition  of  the  digital 
Laplacian  mask  operator).  The  b vector  is  the  right  hand 
side  of  each  of  the  linear  equations  in  the  system.  The 
constants  on  the  left  hand  side  of  each  equation  (that  re- 
sult from  applying  the  Laplacian  operator  to  a variable  pix- 
el that  has  a known  pixel  4-adjacent  to  it)  are  carried  to 
the  right  hand  side  and  appear  in  b.  For  equations  repre- 
senting variable  pixels  not,  4-adjacent  to  known  pixels,  the 
corresponding  b element  is  zero. 

-1 

-14-1 

-1 

Figure  17  - A digital  Laplacian  mask 


The  second  interpolating  surface  has  the  given  boundary 
values  and  minimizes  the  quadratic  variation  of  the  result- 
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ing  surface  [Grimson,  1981].  The  boundary  conditions  with 
which  the  surface  must  agree  are  depth  values  along  the 
zero-crossings.  If  the  surface  elevation  function  is  E and 
subscripts  denote  partial  differentiation,  then  the  final 
surface  E minimizes 

f f (E2  + 2 E2  + E2  } d d 

JJ  **  xy  yy  x y 

Since  the  surface  function  can  be  converted  to  a discrete 
grid  format,  the  differential  operators  can  be  converted  to 
difference  operators,  and  the  double  integral  can  be  con- 
verted to  double  summation,  the  solution  of  the  above  func- 
tion can  be  formed  by  setting  up  a discrete  corresponding 
set  of  linear  equations 

Q x = b . 

The  x and  b vectors  have  tie  same  meaning  as  in  the  Lapla- 
cian  case  and  are  constructed  similarly.  The  Q matrix  is 
likewise  similar  to  the  A matrix  of  the  Laplacian.  Instead 
of  using  Neumann  boundary  conditions  at  the  edge  of  the  im- 
age, the  quadratic  variation  surface  is  defined  by  using 
special  masks  to  fit  the  rows  and  columns  near  the  outside 
edges.  The  six  masks  (Figure  18)  are  rotated  as  necessary 
and  applied  to  the  only  appropriate  variable  pixels  of  the 
elevation  image  to  define  Q.  Mask  two  is  applied  to  corner 
pixels,  mask  three  is  applied  to  pixels  in  the  outside  row 
or  column  that  are  adjacent  to  a corner  pixel,  mask  four  is 
applied  to  other  pixels  in  the  outside  rows  and  columns. 
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mask  five  is  applied  to  pixels  in  the  next-to-the  outside 
row  and  columns  that  are  8-adjacent  to  corner  pixels,  mask 
six  is  applied  to  other  pixels  in  the  next  to  the  outside 
rows  and  columns,  and  mask  1 is  applied  to  all  other  varia- 
ble pixels  in  the  image. 
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F i gur  e 18 


Six  masks  for  the  quadratic  variation  method. 
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The  third  kind  of  interpolation  surfaces  can  be  created 
without  using  any  mask.  For  each  non-boundary  pixel,  we  can 
first  find  its  distances  to  the  nearest  valley  pixels  and 
nearest  ridge  pixels.  From  these  distances  and  the  eleva- 
tions at  these  nearest  valley  pixel  and  nearest  ridge  pixel, 
either  a linear,  cubic,  or  fifth  order  fit  interpolation  can 
be  used  to  calculate  the  elevation  of  this  non-boundary  pix- 
el. If  cubic  fit  is  used,  the  first  order  derivative  is 
zero  at  ridge  and  valley  pixels.  If  fifth  order  fit  is 
used,  both  the  first  and  second  - order,  derivatives  are  zero 
at  ridge  and  valley  pixels.  The.  resulting  images  with  high- 
er brightness  indicating  higher  elevation  and  the  corres- 
ponding surface  plots  are  shown  in  Figure  19.  The  image  and 
surface  plot  of  the  elevations  read  from  digital  terrain 
tape  [NCIC,  1980]  for  this  area  are  shown  in  Figure  20.  The 
reconstructed  LANDSAT  images  by  using  diffuse  light  image 
(Figure  5),  reflectance  image  (Figure  6),  elevation  model 
(Figure  19a),  and  an  artificial  sun  at  specified  azimuth  and 
elevation  angles  are  shown  in  Figure  21.  They  are  reason- 
able reconstructions. 
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Figure  19a.  Elevation  Model  by  Method  1,  Laplacian  Mask 


Figure  19c.  Elevation  Model  by  Method  3,  Linear  fit 


Figure  19d.  Elevation  Model  by  Method  3,  Cubic  fit 
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Figure  19e.  Elevation  Model  by  Method  3,  Fifth  order  fit 


Figure  20  - Elevation  model  from  digital  terrain  tape 
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2. • Cone  Ins  ion 

In  order  to  reconstruct  3D  spatial  information  from 
LANDSAT  imagery,  we  need  to  identify  shadowed  and  directly 
lit  pixels  as  well  as  local  slope  information.  A model  in- 
volving reflectance,  topography,  diffuse  light,  and  haze  has 
been  discussed  and  a technique  for  computing  this  informa- 
tion has  been  given.  The  shadow  reflectance,  and  elevation 
images  look  quite  good  by  comparing  with  the  topographic  map 
of  the  same  area  and  our  understanding  of  the  vegetation 
surface  cover. 

Once  the  shadow  image  and  local  slope  information  is 
determined,  ridge  and  valley  segments  are  detected  and  then 
an  elevation  growing  model  is  used  to  assign  relative  eleva- 
tions to  them.  Int  e rp  o 1 a t i.on  generates  surface  elevation  at 
all  locations  from  the  known  values  at  ridge  and  valley 
se  gement  s . 
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ABSTRACT 


This  paper  considers  mixture  models  of  the  form 


j,  ■>, 


where  Oj  is  a translation  parameter.  An  approach  is  discussed  which 
makes  use  of  a Caratheodory  theorem  on  the  trigonometric  moment  problem 
to  determine  M and  0 j , j=l,2,...,M.  This  theorem  is  also  applied  to 
show  that  translates  of  many  common  distributions  lead  to  identifiable 
mixtures. 
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INTRODUCTION 

Let  F = {fjrj  5e  |R^}  be  a family  of  probability  density  func- 
tions and  let  G be  a distribution  function  on  where  IR^  is  the  set 
of  real  vectors  of  dimension  N.  For  the  given  G we  define  a mixture 
density  h as 

(1)  h = j f ^dG(C) 

Since  all  the  members  of  F are  used  in  this  definition,  it  makes 
sense  to  say  that  according  to  equation  (1)  F defines  a mapping,  say 
F,  from  the  set  of  all  G-distributions , say  G y to  the  set  of  all 
induced  h-densities,  say  H . If  F:G  H is  one-to-one  and  onto  then 
we  say  that  H is  identifiable.  This  formulation  is  essentially  due 
to  Teicher  [1],  Thus,  identif lability  implies  that,  for  a given  mix- 
ture density  h,  a knowledge  of  the  family  F will  allow  us  to  uniquely 
determine  G.  This  has  practical  implications  for  estimating  the  propor- 
tion of  a material  class  on  the  ground  using  remotely  sensed  observa- 
tions of  that  material.  To  illustrate  the  point,  we  offer  the  follow-, 
ing  example . 

Suppose  we  are  given  spectral  measurements,  x,  of  points  (pixels) 
on  the  ground  which  have  been  obtained  from  a satellite-multispectral 
scanner  system.  We  imagine  that  these  x's  are  observations  on  some 
random  variable  X distributed  according  to  density  h.  Suppose  that 
through  experimentation  we  have  found  that  any  given  material  class  on 
the  ground  gives  rise  to  measurements  that  are  normally  distributed  and 
that  in  a given  region  the  mixture  model  that  applies  is : 

M 1 -(l/2)(x-pj)2/(o2) 

Z Xj  \flTU31  ^ . 

3 = 1 


(2) 


h(x)  = 
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With  reference  to  equation  (1)  we  see  that  in  this  example  G 
assigns  a point  probability  Xj  to  the  points  (Pj,  a),  j=l,2,...,M. 

This  is  an  example  of  a finite  mixture  model.  Since  the  M material 

classes  are  associated  with  the  parameters  (pj  , o),  j=l,2 M,  Xj 

can  be  considered  as  the  a-priori  probability  of  observing  the  j-th 
class  or  Xj  is  the  proportion  of  the  j-th  class  present  in  the 
given  region.  The  primary  aim  is  to  determine  the  Xj-values  but  to 
do  that  one  has  to  estimate  M,  p j , cr,  j=l,2,...,M.  Studies  within 
the  AgRISTARS  program  suggest  that  a multivariate  version  of  the  model 
given  in  equation  (2)  fits  reasonably  well  to  agricultural  data,  as 
well  as  to  data  from  natural  vegetative  classes,  c.f.,  Lennington  et 
al.  [2]  . In  those  studies  maximum  likelihood  estimation  methods  were 
used  to  estimate  the  Xj's,  the  means,  and  the  covariances.  The 
number  of  classes,  M,  was  determined  by  applying  a heuristically  de- 
rived algorithm. 

In  this  paper  we  consider  a finite  mixture  model  of  the  form 
M 

(3)  h = l X.  fX 

j=l  J J 


where  0.  is  a location  parameter  and  fA  may  depend  upon  other 
j Uj 

parameters  (this  is. the  reason  for  using  the  superscript  "j”)  in  addi- 
tion to  0 j . In  the  simplest  case  we  have  the  pure  translation 
family,  F f = {fg:0e  IR}  where  each  member  is  a translate  of  some 
given  f.  The  model  in  equation  (2)  is  a specific  example. 

Our  approach  will  make  use  of  a theorem  of  Caratheodory  on  a trig- 


onometric moment  problem  as  discussed  in  Grenander  and  Szegd  [3].  Of 
particular  interest  will  be  the  constructive  proof  (due  to  Szegd) 
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i 


r 

i 


r 


/ 


which  provides  a means  for  computing  M and  0 j , j=l,2,...,M  in  equation 
(3).  We  begin  by  discussing  the  pure  translation  case.  For  that  case 
it  is  possible  to  compute  the  proportions  X j in  addition  to  M and  0j 
for  j=l,2,...,M.  Since  in  the  more  general  case  each  f^  can  depend 
upon  more  than  just  a location  parameter  our  methods  do  not  lead  to 
values  for  the  Xj's.  However,  for  certain  families  of  densities 
knowing  M and  each  0j  may  simplify  the  estimation  of  these  other 
parameters  (e.g.,  see  Redner  [4]). 

THE  PURE  TRANSLATION  CASE 


For  fy  e Ff  let 

j 


M 


(4) 


h - J Xj  fg 

J-l  J 


Since  f is  a density  with  a characteristic  function  F,  the  characteris- 
tic function  of  h is  [note:  in  this  paper  m is  in  radians] 


*?  imOj 

H(u>)  = l X.F(co)e  J 

j-1 


For  any  w that  is  not  a zero  of  F, 


(5) 


J*  1(00  < 

H(u)/F(u)  = l X,e  J 

j-1 


The  following  theorem  due  to  Caratheodory  applies  to  the  form 


given  by  equation  (5). 
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THEOREM  1 


Let  citC2»*««»cn  be  complex  constants  where  cy*0  for  some  v. 

10 1 

There  exists  an  integer  M,  l<M<n  and  constants  Xj , e J such  that 
each  Xj  is  real  and  positive  and  , k*j  and 


M 

-I 

j-1 


ivG. 

Xje  j 


where  M,  X j , and  0j  are  unique. 


v = 1 ,2, . . . ,n 


For  a proof  see  Grenander  and  Szegd  [3]  pages  56  to  61. 

COROLLARY  1 

F f leads  to  an  identifiable  mixture. 

PROOF: 

“ iw0. 

Since  H(uj)/F(a))  = i X.e  J , this  representation  must  be 

j-1 

unique  by  Theorem  1. 

This  corollary,  which  is  an  immediate  consequence  of  the 
Caratheodory  theorem,  was  also  proved  in  a different  manner  by  Yakowitz 
and  Spragins  [5].  We  now  consider  the  determination  of  M and  0j  , 
j=l,2,...,M  by  methods  developed  by  Szegd  [3]. 

Since  F is  (uniformly)  continuous  and  F(0)=1,  there  exists  an  in- 
terval about  (d=0  for  which  the  magnitude  of  F is  positive.  Let  (-b,b) 
be  the  largest  such  interval  and  for  k=0,l,2...,n  let  ujc=kS  where 

2tt  b 

g = min(- ; r , ) 

(n+l)max|0j|  n+1 
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For  these  choices  of  wk,  let  Ck  = H(iok)/F(u)k) , 
C_k  = ( HCco^ ) / F ) ) and  consider  the  Hermitian  matrix 


From  (5)  with  cok  = kg  . 


Thus  $ is  a linear  combination  of  M rank  one  matrices,  and  since  the  Xj 
are  unique  the  rank  of  <|)  must  be  M.  The  Toeplitz  form  v"  $ v is 


r .v  i001  k,  2 

v"<fv  = l Xj  l . vk(e  J ) j 

j=l  J k=0 

Since  n>M,  there  must  be  at  least  one  zero  eigenvalue  of  <f).  Let  v 
be  the  corresponding  eigenvector,  l.e.,  v"<|>v=0.  Since  Xj>0 
for  j=l,2,...,M  the  complex  polynomial 

n 

P(z)  = l vkzk 

k=0 


i80  i00, 

where  z - e , must  have  roots  at  Zj  = e J. 

We  see  therefore  that  the  rank  of  <|l  determines  the  number  of  distinct 

translates  and  the  roots  of  P(z)  are  the  distinct  translations.  The 
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proportions,  Aj,  X2,...»  An  can  be  determined  by  substituting 
specific  x-values  in  equation  (4)  and  solving  the  resulting  system  of 
linear  equations. 


THE  GENERAL  CASE 

We  now  consider  the  general  form  given  by  equation  (3).  For  this 
case  we  choose  families  of  the  form 
F = { fg:OeP,aeRN  } 

where  P is  the  set  of  rational  numbers  and  is  the  set  of  N-dimen- 
sional  real  vectors.  We  will  show  that  at  least  for  certain  cases, 
e.g.,  when  fg  is  an  exponential,  double  exponential,  gamma,  or 
beta  this  family  leads  to  unique  determination  of  0 from  a mixture. 

Since  F is  not  generated  by  one  function  as  was  F f,  we  cannot 
proceed  exactly  as  we  did  in  the  previous  section.  Our  approach  for 
this  case  will  exploit  the  limiting  behavior  of  Fq(oj)  as  co  gets 
large  where  Fq  is  the  characteristic  function  of  fg. 

THEOREM  2 


Let  h = 


= y A.fA  , fA  e F and  let  the  characteristic 

j-l  J °J 

function  of  f^  be  of  the  form 


?1<“) 


a3  + +...+  aj(iw)p 

bj  + b-j(irn)  +...+  b|(im)<? 


+ o 


(„q-p+i) 


1 


ap 

> 0 f°r  each  j or  < 0 for  each  j 

hi 


with  q>p , 
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Let  wkn  = k0  + 2im,  0 = 


2tt 


, k=0 , 1 , . . . ,K,  K>M, 


(K+l)max | 0 j | 
n= 

a)  If  there  exists  a vector  v so  that  for  0j,  O2,...,  ©m 

£ 10k0. 

1)  1 vke  J = 0 

k=l 


then 


£>  ^ q— p 

I I 1 vkV£(i(o£_k>n)  H(u)£_k>n)|=0 
n>°°  k=l  £ = 1 


b)  If  there  exists  a vector  v so  that 

K K q_p 

ii)  lira  sup  | l l vkV£(iw£_k  ) H(a)£_k  n)|=0 

N>“  n>N  k=l  £=1 

then  i)  holds  for  0j,  @2,>««,  ©m* 

Before  we  prove  this  theorem,  consider  the  example  density  func- 
tions given  in  Table  1.  We  see  that  the  exponential  and  gamma  densi- 
ties each  fit  the  forms  given  by  Theorem  2.  In  the  case  of  the  gamma 
density  Y can  be  any  positive  number  but  n must  be  known.  In  the  case 
of  the  beta  density,  0m>n,  whose  characteristic  function  is  Bm>n 

notice  that  B „(w)  = B„  (-m)eia)  . That  is,  when  m and  n are 
m,n  n,m'  ’ 

reversed  the  characteristic  function  can  be  gotten  from  the  original 
characteristic  function  by  changing  u to  -m  and  multiplying  by  eia). 
Thus,  for  example,  if  we  have  0n  ^(x)  = ( (n+2) ! /n! )xn( 1-x) 
then  the  leading  term  in  the  characteristic  function  contains  eiw. 

To  make  Theorem  2 apply  in  this  case  we  must  multiply  H(w)  by 
e- ^ or  replace  Oj  by  0^+1  for  j=l,2,...,M.  In  the  case  of 
01  n(x)  = ( (n+2) ! /n! ]x*( l-x)n  Theorem  2 applies  directly. 


In  addition,  we  could  consider  a convolution  of  a given  density,  f, 
with  members  of  Table  1,  i.e.,  the  family  F"  whose  members  are  of  the 
form  f*fg,  fg  e F and  fg  a member  of  Table  1 (where  denotes  convo- 
lution). To  make  this  theorem  apply  we  need  to  modify  the  theorem 
slightly  by  considering  H(w)/F(w)  in  place  of  H(u>)  where  F is  the  char- 
acteristic function  of  f. 

PROOF  OF  THEOREM  2: 

“ i iu)0 . 

H(to)  = I X1Fj(u)e  J 

j = l 

and 


(im)<i“PFj(a))= 


4 

(ito)p 

4 

(iaj)q 


4 

(ito)P-1 


4 

(im)q-l 


Thus 

Q(uj)  t (iw)^""pH(m)  - n(di)  = £ A.  — — e*  ^ 

j-i  4 

where 


Now 


M 

n(a>)  = l A.  ((im)tl“PFA(aj)  - 

j=l 


ia)0j 


M 


Q(“kn)  - l 

j=l 


i2irn0j  ikgQj 
e J e J 


(6) 
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Thus  given  a vector  v that  satisfies  i) 


(7) 


K K 


M 


K 


ikgOi  .2 


I l l vkvA{Kw*-k,n>l  = 1 l Xj  “7"  ei27In°J  | I vke^PWJ 

k-1  4-1  1=1  J k=l 


, . £ ikSO. 

and  since  lim  |Tl(<i)kn)|=0  and  l vke  J = 0,  the  assertion  in  a)  holds 
n>°°  k=l 


J 


P 

Next  consider  b).  We  assume  > 0 for  each  j;  otherwise,  if 

”q 


b| 


< 0 for  each  j we  need  only  multiply  equation  (6)  by  -1  before 


we  begin  our  argument.  Given  e >0  suppose  that  in  equation  (7)  we  can 
find  a vector  v that  satisfies  ii)  but 


K 


ikBG, 


I vke  i\  > 


k-1 


nlj 


Since  the  are  rational,  they  are  of  the  form  0-s  = where  nj4, 

n2j 

n2j  are  integers.  Thus,  we  can  choose  a subsequence  of  (n)  of  the  form 


M 


(n")  = ((  II  n2i)4),  where  4=1,2,, 

j-1 


so  that  2IIn'’0-t  is  of  the  form 


±2JIt  where  t is  an  integer.  Hence  e 


i2Hn"0 


.1 


= 1 for  all  j which  means 


M 


aJ 

P 


1 l xj  — r e 
j-l  bj 


iMn'Gj  | ^ v eikB0j  ,2k  .2 


M 


ri> 

j-i 


> o . 


But  this  implies  (noting  that  over  (n1)  all  limits  exist) 


K K 

0 = lira  sup  | l l vkV£Q(o)£_k>n) 
N+»  n>N  k=l  1=1 


K K _ M 

= lim  | I l vkv£Q(a)£_k  , ) = e2  l — > 0 

n'-^oo  k=i  £ = i j=l  J bJ 

J q 


which  is  a contraction,  and  this  completes  the  proof. 


Theorem  2 says  that  any  vector  vector,  v,  for  which 

I l l vkv£(la)*-k,n)q  PH(a)*-k,nl=  ^n 

k l 


goes  to  zero  as  n gets  large  must  be  a vector  that  makes  the 
iv  iSQi  ki2 

Toeplitz  form  vk(e  J)  | zero.  Hence  the  theory  for  finding  M 


k 

and  0 j , j=l,2,...,M  discussed  in  the  previous  section  applies  here 
for  large  u)  provided  we  decide  on  M by  looking  at  eigenvalues  yn 
whose  magnitudes  are  small  and  also  account  for  the  fact  that 
{H(to£_k>n)}  is  not  necessarily  Hermitian,  but  "approximately  so." 

We  now  show  that  the  densities  given  in  Table  1 lead  to  identifi- 
able mixtures.  Recall  that  in  a finite  mixture  identif iability  implies 
that  the  representation  given  by  equation  (3)  is  unique.  By  the  methods 
we  have  discussed  so  far.  Theorem  1 will  only  guarantee  that  M and  0j 
for  j=l,2,...,M  are  unique.  The  problem  here  is  that  the  Xj  values 
and  some  of  the  nontranslation  parameters  appear  as  products  in  the 
limiting  form  of  the  characteristic  function.  Thus,  to  guarantee 
identif iability  we  must  consider  a subfamily  of  F in  which  the  non- 
translation parameters  in  the  density  f^j  are  in  one-to-one 
correspondence  with  the  translation  parameters. 
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Let  F'  be  a slight  modification  of  the  family  F.  Namely,  let 
F'  = {fg:OeP,  aeRp,  0=0'  implies  fg  = f{£} 

Thus  F ' is  the  subfamily  of  F for  which  no  two  members  can  have  the 
same  translation  value,  0,  but  still  be  unequal. 


COROLLARY  2 

Let  f “ef  have  a characteristic  function  as  given  by 
Theorem  2. 

a)  F leads  to  a unique  determination  of  M and  the  translation 
parameters . 


b)  F’  leads  to  an  identifiable  mixture. 

PROOF: 

aP 

We  assume  > 0,  as  we  did  in  Theorem  2;  otherwise,  we  con- 

bi<, 

M 

sider  -H(rn)  in  place  of  H(uj).  Let  h = £ A.f^j,  • Then,  as  in 

j = l J J 


equation  (6) 


M 


J 


- ap  i2nn’0j  ikB0j 

Q<wkn')  = l X1  e J e -1 

j=l  bj 


nlj  M 

with  0j  = , n'  = ( II  n2-i)Jl»  £=1,2,...  . Thus  following 

n2j  j=l 


the  proof  of  Theorem  2) 

lim  (iu>k  n.)q_PH(a)kn,)  = l A,  — elke0j.  A,  — f-  >0. 
n,+°°  ’ j = l bJ  b^ 

And  since  the  right  side  of  this  expression  satisfies  Theorem  1, 


the  representation  is  unique.  This  proves  assertion  a). 


In  particular  A 4 — , j=l,2,...,M  are  unique.  But  since  the 

bq 

0 j , j=l,2,...,M  are  also  unique,  it  follows  from  the  definition  of  F' 
that  a^  , bjj,  j = l,2, . . . ,M  are  unique  and  therefore  the  Aj  , j=l,2,...M 
are  unique.  This  completes  the  proof. 

In  the  case  of  the  betas  $n,l  and  Bn>2  given  in  Table  1,  we 

H 

need  only  consider  h(x+l)  = £ Aj  f^(x+l+0^)  in  place  of  the  above 


j=l 


j 


form  for  h.  In  the  case  of  f*i>n  and  $2, n>  we  aeed  not  translate  h. 

NUMERICAL  EXAMPLES 

In  order  to  explore  the  numberical  behavior  of  these  methods, 
simulation  studies  were  conducted.  Some  examples  of  the  simultation 
results  are  presented.  The  characteristic  function  of  a mixture  of 
normals,  with  equal  variances,  or  a mixture  whose  component  densities 
were  exponential,  or  double  exponential,  or  gamma  or  beta  (as  in 
Table  1)  was  used.  In. each  case  the  mixture  contained  three  densities 

Table  2 shows  the  case  where  the  three  densities  are  beta  densi- 
ties. The  two  end  distributions  (0=1  and  0=2)  are  held  fixed  and  the 
center  distribution  (1<0<2)  is  considered  for  several  values  of  the 
translation  parameter.  The  results  show  that  when  two  of  the  0's  are 
close  together,  the  error  in  the  determination  of  their  values  is 
larger  than  for  the  case  where  they  are  far  apart,  as  would  be  ex- 
pected. In  each  case  the  <jl  matrix  had  three  large  eigenvalues  (i.e., 
substantially  larger  than  zero)  so  that  it  was  an  easy  matter  to  say 
that,  for  numerical  purposes,  the  rank  of  should  be  3. 


70 


TABLE  2:  Determining  Location  Parameters  for  a Mixture  of  Betas 


3 

h(x)  = l A . f J (x-0j ) 
j=l  J 


( (nj+3)!  2 n 

\ (x-Oddl-x-W,  )n,  0 • <x<0 . +1 

) n j j 2 •'  3 J J 

f^x-e.)  = j 

[ 0,  x<0j,  x>0j+l 


A 

3 

n 

True  0 

True  0 

-Estimate 

1/3, 1/3, 1/3 

1 

8,4,4 

1,1.01,2 

.18852, 

.00826,  .00002 

1,1.05,2 

-.00091,- 

.00560, -.00003 

1,1. 2, 2 

.00002,- 

.00001,  .00006 

1,1. 4, 2 

-.00001,- 

.00029,  .00006 

1,1. 6, 2 

-.00001,- 

.00027, -.00009 

CM 

•* 

00 

• 

—H 

A 

-.00001,- 

.00104,-. 00068 

In  Tables  3-6,  the  same  basic  experiment  was  repeated  for  the 
normal  (equal  variances)  and  the  exponential,  double  exponential,  and 
gamma.  In  these  cases  values  of  0 close  to  1 were  studied.  Except  in 
the  case  of  amixture  of  normals  with  the  means  .01  apart,  in  each  case 
the  rank  of  (j:  was  judged  to  be  3.  The  same  difficulties  with  deter- 
mining 0-values  occurred  as  were  noticed  with  the  beta  mixtures. 
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Theorem  2 uses  a scale  factor  6 = ; — . The  precise 

(K+l)max(0j| 

value  of  3 is  not  important  to  this  theorem.  We  need  only  choose  a 
scale  factor  so  that  e^c30>  k=0,l,...,K  does  not  repeat.  In 
Tables  3-6  we  explored  the  use  of  3=1  and  3=1.5  and  we  noted  that 
there  can  be  considerable  differences  in  the  determination  of  the  0- 
values.  In  a real  case,  the  choice  of  3 would  also  presumably  in- 
fluence the  accuracy  of  the  answers , however , at  this  time  we  have  not 
studied  its  effect  enough  to  comment  on  possible  appropriate  values. 

CONCLUDING  REMARKS 

To  apply  these  methods  one  must  know  & in  advance  in  order  to 
determine  the  appropriate  operator,  e.g.  (ioO^-p,  to  apply  to  the 
characteristics  function  of  the  mixture,  H.  There  is,  however,  some 
leadway.  For  example,  we  see  from  Table  1 that  one  could  have  a mix- 
ture of  double  exponentials,  of  gammas  (n=l),  or  of  betas  of  the  form 
^nP  ‘ (x~®)  ( l-x-*C))n,  0<x<l,  and  still  determine  M and  0-values 

by  using  the  operator  (irn)^.  Thus,  some  inexact  knowledge  of  the 
underlying  mixture  model  can  be  tolerated.  Since,  we  have  not  as  yet 
explored  the  estimation  problems  associated  with  these  methods  we 


TABLE  3 : Determining  Location  Parameters  for  a Mixture  of  Normals 


h(x) 

= I -i-  e 
J-l  V/27T 

- J[(x-0j)2 

X 

0 

True  0 

True  0 -Estimate 

.5, .3, .2 

1.0 

1,1.01,2 

-.00031, not  found*, .00011 

1,1.03,2 

-.00038, -.00008,0 

1,1.05,2 

.00233,  .00641,0 

1,1. 1,2 

-.00012, -.00026, -.00001 

.5, .3, .2 

1.5 

1,1.01,2 

-.00372, not  found*, .00011 

1,1.03,2 

.00059,  .00146,  .00001 

1,1.05,2 

.00005,  .00004,0 

1, 1.1,2 

-.00007, -.00001,0 

*The  computer  program  could  not  distinguish  between  0=1  and  0=1.01. 
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TABLE  4 : Determining  Location  Parameters  for  a Mixture  of 

Exponentials 


i j 

3 

fJ  (x— 0 j 

- 

h(x)  = l X. 

j=l  J 

) 

rr. 

I 

( 

-b(x-Oj  ) 

. 

fj(x-Oj)  = 

J b e 

» x>9j 

f | 

°, 

x<0j 

X 3 

b 

True  0 

True  0 -Estimate 

i 

.5, .3, .2  1 

1,5,2 

1,1.01,2 

.19612,  .00249, 

.00003 

i;: 

1,1.03,2 

.00266,  .00366, 

.00002 

r 

1,1.05,2 

.00415,  .00014, 

.00001 

1, 1.1,2 

.00192,  .00076, 

.00001 

.5,. 3, .2  1.5 

1,5,2 

1,1.01,2 

-.00614, -.01134, 

0 

— 

1,1.03,2 

-.00002, -.00005, 

0 

1,1.05,2 

-.00018, -.00011, 

0 

r 

1,1. 1,2 

-.00004, -.00002, 

0 
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TABLE  5:  Determining  Location  Parameters  for  a Mixture  of 

Double  Exponentials 


h(x)  = l A.fJ(x-Oi ) 

j=l 


f^Cx-Gj)  = 


ab  a(x-Gj) 
a+b 

ab  -b(x-Gj) 

I a+b  6 


x<0j 

x>0j 


X 

0 

a 

b 

True  0 

True 

0 -Estimate 

.5, .3, .2 

1 

1 

2 

1,1.01,2 

.12330, 

.00627, 

.00003 

1,1.03,2 

.00110, 

.00338, 

0 

1,1.05,2 

.00213, 

.00831, 

.00004 

1,1. 1,2 

.00004, 

.00003, 

0 

. 5, . 3, .2 

1.5 

1 

2 

1,1.01,2 

.00297, 

.00600, 

0 

1,1.03,2 

.00050, 

.00076, 

0 

1,1.05,2 

.00024, 

.00043, 

0 

1,1. 1,2 

.00008, 

.00012, 

0 

TABLE  6:  Determining  Location  Parameters  for  a Mixture  of  Gammas 


3 

h(x)  = l X.f^Cx-Q.) 

j=l 


fj(  x-Qj) 


1 -(x-0j )/y 

(x-Oi)ne  , x>0. 

n!Yn+1  3 3 

0 , x<0 j 


X 3 Y n 

.5, .3, .2  1 4 3 


.5, .3, .2  1.5  4 3 


True  0 

True  0 

-Estimate 

CN 

H 

o 

• 

H 

H 

2.72623, 

.00628,  .00011 

1,1.03,2 

-.01094, 

.01407, -.00001 

1,1.05,2 

-.01472, 

.02170,  0 

CN 

•» 

• 

rH 

.00022, 

.00086, -.00001 

1,1.01,2 

.00650, 

.00618,  .00001 

1,1.03,2 

.00042, 

.00235,  0 

1,1.05,2 

.00007, 

.00015,  0 

1,1. 1,2 

.00001, 

.00007,  0 

cannot  comment  on  whether  or  not  such  an  inexact  knowledge  of  the  mix- 
ture will  translate  over  to  more  general  lack-of-fit  problems  when  real 
data  is  encountered. 

In  this  paper  we  have  only  considered  the  univariate  case.  However, 
at  least  in  the  case  of  mixtures  of  normals,  it  would  appear  that  the 
multivariate  extension  is  straightforward  provided  one  is  clever  about 
choosing  the  sampling  values  of  m.  In  future  work  we  hope  to  consider 
multivariate  extensions. 
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ABSTRACT 

In  this  paper  we  describe  our  current  efforts  to  develop  methods 
and  computer  algorithms  to  effectively  represent  multivariate  data  com- 
monly encountered  in  remote  sensing  applications.  This  may  involve 
scatter  diagrams  but  we  are  emphasizing  multivariate  representations  of 
nonparametric  probability  density  estimates.  The  density  function  pro- 
vides a useful  graphical  tool  for  looking  at  data  and  a useful  theoreti- 
cal tool  for  classification.  We  call  our  approach  a thunderstorm  data 
analysis. 


1.  Graphical  Tools  in  Data  Analysis 

A recent  theme  in  multivariable  data  analysis  as  advocated  by,  for 
example,  John  and  Paul  Tukey  [13]  emphasizes  graphical  techniques  for 
looking  for  multidimensional  structure  in  data.  The  bivariate  scatter 
diagram  has  been  a very  useful  tool  in  this  approach.  For  data  in  more 
than  two  dimensions,  careful  selection  of  bivariate  projections  can 
reveal  structure  in  higher  dimensions;  see,  for  example,  a description 
of  the  projection  pursuit  algorithm  [3].  Alternately  glyphs  may  be 
drawn  instead  of  dots  in  a bivariate  scattergram  and  data  values  not 
displayed  are  represented  by  features  in  the  glyph,  such  as  length, 
angle,  etc.  Computer  graphics  workstations  have  recently  made  trivari- 
ate scatter  diagrams  feasible.  A true  three-dimensional  effect  may  be 
had  by  either  continuous  rotation  of  the  scatter  diagram  or  by  a variety 
of  stereographic  techniques  using  red/green  or  polarized  glasses.  Holo- 
grams and  rapidly  vibrating  mirrors  also  can  proved  3-D  effects.  For 
data  with  more  than  three  variables,  side-by-side  scatter  diagrams  of 
subsets  of  variables  with  visual  links  (such  as  coloring  the  same  point 
in  the  different  diagrams)  allow  a representation  of  the  data. 

Scatter  diagrams  do  have  limitations  in  data  analysis.  The  most 
important  problems  relate  to  sample  size.  For  moderately  large  samples 
( n > 500  ) data  replication  (or  overstriking  on  the  graphical  medium) 
begins  to  occur  frequently.  This  problem  has  been  referred  to  as  the 
problem  of  "too  much  ink"  [12].  In  one  example  of  a fairly  large  3-D 
scatter  diagram  with  n = 22,932  on  a 512  by  512  graphics  terminal,  only 
4,000  pixels  were  observable  [5].  With  continuous  rotation  many  more 
points  are  viewable  but  current  computer  technology  limits  real-time 
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rotations  to  about  one  thousand  points.  Secondly,  clusters  of  points 
that  are  close  together  are  difficult  to  detect  in  scatter  diagrams.  In 
other  words  scatter  diagrams  provide  only  modest  indications  of  the  den- 
sity of  points  in  a given  region.  Thirdly,  our  impression  of  data  from 
the  same  underlying  density  function  is  highly  dependent  on  the  sample 
size.  This  makes  comparisons  of  scatter  diagrams  with  different  sample 
sizes  nontrivial.  The  eye  naturally  leaves  the  center  of  the  data  and 
focuses  on  outliers  and  apparent  structure  (lines)  in  outlying  regions. 
Such  features  may  or  may  not  be  of  great  importance  depending  on  the 
objectives  of  the  data  analysis.  In  a recent  example  of  a bivariate 
scatter  diagram  of  412,776  points,  a frequency  polygon  analysis  revealed 
that  over  97%  of  the  points  fell  inside  the  1%  contour  (that  is,  points 
where  ^(x,y)=l%  of  ^(mode))  which  occupied  less  than  j~th  of  the 
display  area  [6].  Almost  half  of  the  pixels  in  the  display  area  were 
illuminated.  On  a 256  by  256  display,  many  points  were  replicated  over 
300  times  and  one  more  than  1000  times. 

We  also  advocate  using  scatter  diagrams  for  looking  at  data.  How- 
ever since  we  are  interested  in  discovering  structure  such  as  modes  and 
high  density  regions,  we  have  found  that  the  density  function  is  a more 
useful  tool  when  taking  a preliminary,  look  at  data  in  several  dimen- 
sions. The  density  function  does  not  change  with  sample  size,  although 
the  quality  of  estimation  changes.  In  a sense  the  scatter  diagram 
points  to  the  density  function,  as  Jim  Thompson  has  described  it.  In 
the  next  sections  we  describe  our  current  work  based  on  multivariate 
nonparametric  density  estimation. 
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2,  Computational  and  Representational  Problems 
in  Multivariate  Density  Estimation 

Honparametric  density  estimation  methods  for  multivariate  data  are 
often  simple  extension  of  well-studied  univariate  versions.  The  mul- 
tivariate histogram  is  a computationally  efficient  estimator  but  suffers 
from  empty  bin  problems  and  bin  edge  effects.  Statistically  more  effi- 
cient and  smoother  multivariate  estimators  may  be  obtained  by  kernel  or 
nearest  neighbor  methods;  see  Tapia  and  Thompson  [10].  Efficient  algo- 
rithms for  the  latter  have  been  developed  but  little  is  known  about 
nearest  neighbor  global  properties  beyond  some  pointwise  results.  Some 
empirical  evidence  indicates  nearest  neighbor  estimates  tend  to  peak  at 
modes  and  some  optimal  binning  studies  seem  to  draw  the  same  conclusion 
[11].  Some  special  attention  and  techniques  are  needed  in  the  tails 
since  the  raw  estimate  does  not  have  a finite  integral. 

Thus  we  believe  at  this  time  the  fixed  multivariate  kernel  estima- 
tor of  Cacoullos  [2]  is  a useful  technique  for  data  in  2-4  dimensions. 
Unfortunately  computational  requirements  grow  rapidly  in  higher  dimen- 
sions if  one  desires  to  evaluate  the  estimate  of  a representative  mul- 
tivariate mesh.  The  estimator  also  requires  the  entire  raw  data  in 
order  to  compute  the  pointwise  estimates.  Some  research  has  focused  on 
one  and  two  dimensional  numerical  approximations  to  kernel  estimates  in 
order  to  achieve  computational  efficiency  [9].  However  few  results  are 
currently  available  for  more  variables. 

Another  approach  is  to  construct  a frequency  polygon  estimator 
(formed  by  connecting  with  straight  lines  the  mid-bin  values  of  a histo- 
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gram).  This  estimator  has  the  same  order  of  statistical  efficiency  as 
the  kernel  estimator  and  also  the  computational  efficiency  of  the  histo- 
gram. However  bin  edge  effects  still  can  be  a problem  for  small  samples 
and  in  higher  dimensions.  Thus  we  have  recently  proposed  a new  density 
estimator  based  on  a frequency  polygon  of  the  averaged  shifted  histogram 
(ASH)  estimator  [73.  The  ASH  is  simply  the  pointwise  average  of  m his- 
tograms with  common  equally  spaced  bins  of  width  h but  different  bin 
origins  tg+^»  i = 0»..*m-l.  Thus  the  ASH  looks  like  a histogram  with 
bin  width  h/m.  As  m->-oo  the  ASH  is  identical  to  the  statistically  effi- 
cient triangular  kernel  estimate.  Values  of  m between  3 and  10  are  suf- 
ficient for  most  purposes.  Multivariate  versions  are  easily  constructed 
by  shifting  and  averaging  in  all  co-ordinate  directions. 

Representational  difficulties  have  been  addressed  for  three  and 
four  variable  density  estimates  (function  surfaces  in  four  and  five 
dimensions,  respectively)  by  displaying  appropriate  contour  plots.  For 
trivariate  data  a contour  of  ^(x,y»z)  will  be  a set  of  points 

Sc  = { (x,y,z)  e R3  : ^(x,y,z)  = c } . 

3 

The  set  S will  be  a surface  in  R (or  more  than  one  surface  if  the 
c 

density  is  multimodal  at  this  level).  On  a graphics  terminal  we  have 
chosen  to  represent  Sc  by  intersecting  it  with  a series  of  equally 
spaced  planes  orthogonal  to  the  x-axis,  say,  and  then  drawing  the  con- 
tours defined  by  these  intersections.  The  resulting  Bwire"  diagrams 
give  a strong  3 dimensional  impression.  If  color  is  available,  several 
contour  levels  may  be  simultaneously  displayed  by  using  a different 
color  for  each  level.  We  refer  to  our  picture  as  a thunderstorm  data 
representation. 


83 


It  is  helpful  to  imagine  what  this  representation  looks  like  for 
trivariate  Gaussian  data.  For  the  independent  variable  case.  is  sim- 
ply a sphere  so  that  a color  display  would  show  several  concentric 
spheres  with  the  mode  located  at  the  center.  This  is  roughly  illus- 
trated in  Figure  1.  If  the  data  are  correlated  we  will  see  ellipsoids 
rather  than  spheres. 

To  represent  the  density  estimate  of  four  variables,  ^(x.y.z.t), 
we  look  at  the  sets 

3 

, s = { (x.y.z)  e R : ^(x.y.z.t)  = c } . 

L » C 

Here  we  have  arbitrarily  chosen  one  variable  and  placed  it  in  a refer- 
ence frame  which  may  conveniently  be  thought  of  as  a ntime,,  axis.  By 

looking  at  a time-lapse  sequence  of  representations  of  S we  obtain  a 

t * c 

useful  view  of  the  data  which  highlights  important  features  such  as 
modes,  outliers,  symmetry,  skewness,  and  covariance  structure.  This 
sequence  is  similar  to  a time-lapse  movie  of  a thunderstorm  from  its 
original  formation  to  peak  of  storm  to  its  eventual  end. 

Again  it  is  useful  to  construct  this  representation  for  quadravari- 

ate  Gaussian  data.  For  a fixed  contour  level  c,  as  t moves  through  the 

relevant  interval  of  support  (t  . ,t  ),  will  be  a sequence  of 

min  max  t,c 

initially  expanding  spheres  (ellipsoids)  which  continue  to  grow  until 
the  mode  is  reached  and  then  contracting  and  finally  vanishing  when  St  ^ 
becomes  the  null  set. 

We  have  recently  experimented  with  these  representations  using 

3 

Landsat  remote  sensing  reflectance  intensity  data  sets  in  £ (n  = 

23,000)  and  with  a particle  physics  data  set  in  (n  = 500);  see 
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Scott  [5]  and  Scott  and  Thompson  [8].  A 16mm  color  film  was  used  to 
record  the  time-lapse  thunderstorm  representation  of  the  particle  phy- 
sics data  set*  These  data  have  been  analyzed  by  Friedman  and  Tukey  [3] 
and  by  Tukey  and  Tukey  [13]  using  exploratory  data  and  scatter  diagram 
techniques.  Our  representations  seem  to  be  successful  in  uncovering 
important  data  features  and  structure  and  seem  to  require  less  training 
in  the  four  dimensional  case  than  required  for  four  dimensional  rotating 
scatter  diagram  methods. 

3.  Graphical  and  Model-Based  Discrimination 
and  Classification 

We  shall  assume  that  our  data  samples  are  labelled  so  that  super- 
vised clustering  and  discrimination  are  feasible.  As  a preliminary 
step*  side-by-side  scatter  diagrams  may  be  displayed  to  get  a rough 
feeling  for  the  separability  of  cluster  classes.  This  may  also  be 
accomplished  by  displaying  side-by-side  density  contour  plots  for  the 
cluster  classes.  For  large  training  samples  the  latter  is  more  useful 
(see  the  comparison  of  a scatter  diagram  and  contour  plot  for  412*776 
points  mentioned  in  section  1).  The  scatter  diagram  might  indicate  no 
separation  at  all. 

When  the  preliminary  density  estimates  have  been  refined  by  optimal 
data-based  choices  of  smoothing  parameters,  classification  may  be  accom- 
plished using  a Bayesian  classifier.  Evaluation  of  the  averaged  shifted 
histogram  for  each  class  involves  only  a bin  location  operation  (sub- 
traction and  division)  and  then  a table  lookup  for  each  training  class 
(hash  function,  perhaps).  This  is  a computationally  efficient  operation 


although  large  memory  requirements  are  necessary  in  several  dimensions. 
We  plan  to  implement  this  strategy,  and  report  on  our  results  shortly. 

Examples 

We  shall  consider  the  scatter  diagram  approach  discussed  in  section 
3 as  a preliminary  step  towards  producing  a nonparametric  classifier. 
The  data  are  trivariate  and  come  from  a model  applied  to  individual  pix- 
els (1.1  acre)  using  temporally  measured  Landsat  data.  Approximately 
biweekly  4-channel  remote  sensing  relectance  intensity  data  were  con- 
verted into  a single  ’’greenness”  time  series  by  looking  at  a certain 
linear  combination  of  the  4-channel  data.  The  time  series  was  fitted  by 
Badhwar's  [1]  growth  model  which  looks  somewhat  like  a bell-shaped 
curve.  For  each  pixel  three  parameters  from  Badhwar’s  model  were 
extracted:  x,  the  time  of  peak  greenness;  y,  the  ripening  or  reproduc- 
tion period;  and  z,  the  peak  greenness  level.  Each  measurement  was 
recorded  on  a discrete  scale  from  0 to  249.  The  data  are  processed  in  a 
segment  which  is  5 by  6 nautical  miles  and  contains  22,932  (117  by  196) 
pixels.  Ground  truth  was  obtained  by  sending  observers  to  the  fields. 

In  Figure  2 we  show  a view  of  the  3-D  scatter  diagram  for  segment 
1380  in  Minnesota,  1978.  Notice  the  orientation  of  the  axes  (located  at 
the  true  origin)  in  this  projected  and  rotated  view.  The  projected  x- 
axis  is  defined  by  the  vector  (-.71, .71,0)  and  the  y-axis  is  defined  by 
(-.58, -.58,. 58) . This  scatter  diagram  is  a mixture  of  "pure"  and 
"mixed”  pixels.  In  Figure  3 we  show  a scatter  diagram  of  3,947  pure 
pixels  of  corn  from  segment  1380.  Figure  4 depicts  the  5,162  pure  pix- 
els of  soybeans.  A quick  impression  of  the  separability  of  corn  and 
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soybeans  is  possible  from  these  graphs,  but  again  recall  that  a large 
fraction  of  the  data  are  hidden,  making  the  discrimination  judgment  very 
difficult. 

Small  grains  present  a difficult  problem  for  a classifier.  In  Fig- 
ure 5 we  view  segment  1899  in  North  Dakota,  1977.  using  the  same  projec- 
tion plane  as  before.  The  two  segments  look  quite  different  in  this 
representation.  Figure  6 represents  1,756  pure  pixels  of  sugar  beets. 
Figure  7 represents  3,355  pure  pixels  of  spring  wheat.  Finally,  Figure 
8 shows  4,362  pure  pixels  of  barley.  These  classes  present  a challenge 
for  any  discrimination  procedure. 

5.  Conclusion 

We  have  attempted  to  illustrate  how  nonparametric  density  methods 
may  be  brought  to  bear  directly  on  multivariate  remote  sensing  problems. 
Multivariate  parametric  models  based  on  mixture  models  [4]  have  many 
advantages,  both  conceptually  and  in  production  mode.  The  fitting  prob- 
lems in  the  parametric  case  are  usually  quite  difficult.  We  hope  to 
investigate  how  nonparametric  models  may  provide  guidance  to  the  fitting 
and  verification  of  such  parametric  models.  This  would  be  a direct  use 
of  the  exploratory  capabilities  of  the  nonparametric  models. 
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Figure  1„  Representation  of  Contours  of  Three 

Dimensional  Density  Estimate  if  Gaussian 
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Figure  2.  Projected  and  Rotated  Three  Dimensional 
Scatter  Diagram  of  Segment  1380  (1978) „ 
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Figure  4 


Pure  soybean  pixels  in  segment  1380  (n=5,162) 


Figure  6„  Sugar  beet  pixels  in  segment  1899  (n-1,756) 


Figure  7 . 


Spring  wheat  pixels  in  segment  1899  (n=3,355). 


Figure  8„  Barley  pixels  in  segment  1899  ,3^2) . 
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ABSTRACT 

A scene  segmentation  approach  is  presented  which  is  based  on  gen- 
erating autoregressive  field  models  for  each  scene  component  (class) 
from  its  a priori  spatial  statistics.  A methodology  is  also  described 
for  using  these  models  in  achieving  optimal  segmentation  of  a scene. 

The  derivations  are  presented  for  the  case  of  single  band  imagery,  how- 
ever, the  method  is  believed  to  be  extendable  to  multispectral  data. 
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1.  Introduction 

A subject  of  central  importance  in  image  pattern  recognition  and 
analysis  has  been  scene  segmentation  and  classification  of  scene  com- 
ponents. In  addressing  this  subject,  a number  of  different  methodolo-  . 
gies  and  approaches  have  been  proposed  and  implemented.  These  range 
from  simple  thresholding  concepts  to  methods  that  define  a scene  compo- 
nent by  a set  of  texture  measures  and  achieve  segmentation  using  such 
measures  [1]. 

This  research,  being  reported  in  this  paper,  is  concerned  with  the 
development  of  techniques  for  segmentation  when  the  scene  components 
(referred  to  as  classes)  are  or  can  be  described  statistically.  Specifi- 
cally, the  concepts  and  procedures  that  are  developed  apply  to  the  cases 
where  the  scene  components  are  members  of  a two-dimensional  and  station- 
ary Gaussian  process.  Though,  the  final  goal  of  this  activity  is  to  have 
segmentation  techniques  for  multispectral  data,  this  report  covers  the 
approach  for  a single  band  image.  The  extension  of  the  derived  methods 
for  application  to  multispectral  data  are  currently  under  investigation. 

Statistical  description  of  scene  components  has  been  established  as 
a viable  approach  in  pattern  recognition  and  image  analysis  [l]-[4].  In 
the  following,  the  approach  taken  is  that  of  first  describing  each  class 
by  an  autoregressive  model  using  the  a priori  statistics  of  that  class 
and  then  employing  these  models  in  achieving  segmentation.  After  the 
general  notation  is  established  in  Part  2,  the  modeling  technique  is  de- 
rived in  Part  3.  In  Part  4 the  segmentation  technique  which  uses  the 
derived  models  is  presented  and  discussed. 
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2-  Preliminaries  and  Notations 

For  a single  band  image  let  there  be  M classes  wM,  where 

the  intensities  of  pixels  in  each  class  are  a sample  function  of  a two- 
dimensional  (2-D)  Gaussian  and  stationary  random  process  with  known  a 
priori  means  p,  uM  and  autocorrelations  

f*  h 

So  for  the  ku  class,  the  a priori  mean  yk  and  the  autocorrelation 
Rk^Tl’T2^  are  defined  ^ 

(2.1)  yk=EIk(i,j) 

Rk('frx2)  = E[Ik(m,n)-yk][Ik(i,j)-yk] 

lx 

where  t-j  = |m-i|,  x2  = |n-j|,  I (i,j)  denotes  the  intensity  value  at 

j_  i- 

pixel  location  (i,j)  in  the  kin  class  and  E is  the  expectation  operator. 

In  the  subsequent  sections,  autoregressive  models  of  various  orders 
will  be  defined  and  used.  Figure  1 defines  what  is  meant  by  specifying 
various  autoregressive  model  orders  on  a two-dimensional  grid.  Thus  a 
first  order  model  for  location  (i,j)  contains  the  pixel s{  (i-1  ,j ) ,(i  ,j-l ), 
(i-1  ,j-l  )>  and  a second  order  model  contains  the  pixels  {(i-l,j), 

(i »J-1 ) » (i-1 >j-l )>  (i-2,j),  (i,  j-2),  (i-2,j-l ) , (i-l,j-2),  (i-2,j-2)} 
and  so  on.  Index  i represents  the  line  (row)  indicator  and  j is  the 
sample  (column)  indicator  on  a 2-D  grid. 
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3.  Autoregressive  Modeling  Procedure 

Autoregressive  models  have  been  analyzed  and  used  in  the  area  of 
image  processing  and  analysis  for  some  time  [1 ]- [3] , "[5]-[6].  In  gen- 
eral, for  a zero  mean  Gaussian  random  process  x(i,j),  these  models  are 
of  the  form  [7] 

(3.1)  x(i,j)  = £ S“pq  x(i-p.3-q)  + U(i,j) 

(p.q)eD 

where, 

(3.2)  D = {(P,q):-M  i p i H,  -N  s q s N,  (p,q)  f (0,0)} 

and  U(i,j)  are  a set  of  independent  Gaussian  random  variables,  where 

(3.3)  E U(i ,j)  = 0 ? 

(o^  if  isk  & 

E U(i,j)  U(M)  = 

(0  otherwise 

2 

a and  a are  constants  if  x(i,j)  is  stationary  and  they  are  a function 

rM 

of  (i,j)  if  x(i,j)  is  nonstationary. 

A causal  form  of  the  model  in  (3.1)  is  the  subject  of  interest  in 
this  paper.  In  this  causal  form  (3.1)  is  written  as 

P P 

(3.4)  x(i,j)  = Y!  S “nn  x(i-p.j-q)  + U(i,j) 

p=0  q=0 

p+q  / o 

where,  again  with  stationarily,  a are  constants  and  U(i»j)  are  a set 

rH 

of  identically  distributed  random  variables  satisfying  (3.3).  Here  P is 
the  order  of  the  autoregressive  model  corresponding  to  the  definition  of 
the  model  order  given  in  Figure  1.  An  example  of  such  a causal  model  is 
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the  first  order  model 

(3.5)  x(i,j)  =ct01  x(i,j-l)  +a10x(i-l,j)  + x(i-l.j-l)  +U(i,j) 

which  has  a two-dimensional  separable  correlation  function  [6]  of  the 
form 

(3.6)  Rfx-j^)  = EXP{-^-||ti|  - &2  I T2  I 

The  thrust  of  modeling  in  segmenting  a scene  is  to  transform  the 
information  provided  a priori  about  each  class  (namely  the  correlation) 
into  an  autoregressive  model  and  use  these  models  in  subsequent  de- 
velopment of  segmentation  methods.  Clearly  the  choice  of  autoregres- 
sive forms  is  arbitrary  and  there  is  no  claim  made  here  that  all  classes 
can  be  modeled  by  such  forms.  However,  the  causality  restriction  that 
has  been  imposed  (and  will  be  adhered  to  throughout  this  paper)  is 
necessitated  by  the  particular  modeling  procedure  described  in  3.1  and 
the  properties  of  the  derived  models  which  are  descussed  in  3.2. 
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3.1.  The  Autoregressive  Modeling  Technique 


In  the  following  a procedure  is  developed  for  deriving  the  model 
from  the  given  a priori  correlation.  Since  this  process  is  done  for 
each  class,  then  the  class  indicator  (superscript  k)  is  omitted  from  all 
arguments  in  the  ensuing  discussion. 

For  a given  2-D  and  stationary  correlation  function  R(t-|.,t2)>  let 
us  assume  a model  order  P.  First  we  will  develop  a technique  for  defin- 
ing the  model  for  a given  P and  then  we'll  show  how  the  "best"  order  P 
is  chosen.  For  a given  order  P,  the  model  is 

P P 

(3.7)  x(i,j)  = ^ X)  ak£  x(i"k»3"£)  + u(i»3) 

k-0  1=0 
k + i f 0 


This  model  is  completely  defined  if  the  values  of  all  the  constants  ct^ 
and  the  variance  of  the  zero  mean  white  noise  process  U(i,j)  are  known. 


Thus,  for  a given  order  P,  there  are  (P+1 y unknowns  to  be  computed 

‘ka 


2 2 
Where  (P+1)  -1  of  these  are  the  unknowns  and  one  unknown  is  a where 


(3.8) 


a2  = E U2(i,j) 


The  criterion  adopted  here  for  computing  these  unknown  parameters 

is  that  of  minimum  variance  of  U(i,j).  Thus  ct^  are  found  such  that 
2 2 

E U (i,j)  is  minimized  and  a is  taken  to  be  that  minimum  value.  From 
(3.7) 


E U2(i,j)  = E[x(i , j)  - ^ ^ ak£  x(i-k,j-/)]‘ 

k=0  £=0 
k +ilT4  0 


(3.9) 
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Differentiating  (3.9)  with  respect  to  ct^'s  and  setting  it  equal  to  zero 

o 

results  in  the  (P+1)  -1  equations 


P P 

(3.10)  E[x(i,j)  - x(i-k,j-/)]  x(m,n)  = 0 

k=0  sl=Q 
k + a ? 0 

m = i , i-1 , i-P 

n - j , j “ 1 9 • • • • 9 3“P 
(m,n)  i (i , j) 

Carrying  the  expectation  operator  through  in  (3.10)  and  rearranging  the 
terms  results  in  a system  of  linear  equations  of  the  form 

(3.11)  AS  = b 

where  elements  of  the  vector  a are  the  coefficients  and  the  elements 
of  the  matrix  A and  the  vector  b are  values  of  the  correlation  function 


R(T1 >T2^  * 

Having  solved  for  the  coefficients  in  (3.11),  it  remains  to 

2 

determine  the  quantity  a in  order  to  have  the  model  defined.  Expanding 

2 

the  quadratic  form  in  (3.9),  a can  be  written  as 

P P 

x(i,j)  - X)  ak£  x(i-k’j-£)l  x(1,j) 

k=0  i=0 
k + £ f 0 


(3.12)  a2  = E U2(i ,j)  = E 


-E 


P P 

x(i,j)  - ^ x( i-k, j-Jt) 

k=0  1=0 
k + % f 0 


■ P P 

_k=0  1=0 
k + i f 0 


101 


But  from  the  relations  in  (3.10) 


P P 

x(i,j)  - X X)  ak£ 
k=0  £=0 
k + £ t 0 


P P 

X X ak£  x(i-k,j-&) 

Lk=0  £=0 
k + £ ? 0 


= 0 


Thus 


(3.13)  a = E[x(1,j)]‘ 


P P 

X X «k£  E x(i,j)  x(i-k,j-£) 

k=0  £=0 
k + £ f 0 


= R(0,0)  - 


P P 

EE  ak£ 

k=0  £=0 
k + £ f 0 


To  have  completely  defined  the  modeling  process,  it  remains  to  show 
how  the  model's  order  P is  chosen.  Before  stating  the  process  that  al- 
lows one  to  choose  the  optimal  order,  let  us  review  what  is  the  objec- 
tive of  the  modeling  endeavour  and  what  is  meant  by  optimal.  As  stated 
before,  the  objective  is  that  of  generating  an  autoregressive  model 
whose  second  moment  characteristics  (the  correlation  function)  approx- 
imates the  given  a priori  correlation  function  R(t-|,t2)  as  closely  as 
one  wishes.  However,  the  criterion  chosen  for  defining  the  model  has 
been  minimization  of  the  white  noise  variance.  Besides  the  intuitive 
appeal  of  this  criterion,  it  will  be  shown  in  the  next  section  that  this 
criterion  also  satisfies  the  stated  objective  above.  Hence  finding  the 
best  order  is  achieved  by  generating  models  of  various  orders  and  choos- 
ing the  one  whose  white  noise  has  minimum  variance.  In  general,  then 
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p 

successively  higher  order  models  are  assumed  and  their  parameters 
2 

and  dp  , P = 1,  2,  are  computed.  Optimal  choice  of  P is  made 

according  to  one  or  more  of  the  following: 

2 2 2 

1.  dp  does  not  change  with  increasing  P i.e.,  dp+^  = dp  . This 

is  the  case  where  the  underlying  process  has  an  exact  auto- 
regressive model  of  order  P as  will  be  shown  in  Section  3.2. 

2.  Only  few  values  of  the  a priori  correlation  function  R(t-|,t2) 
are  specified  which  limits  how  high  the  order  P that  can  be 
chosen. 

2 

3.  Rate  of  decrease  of  dp  as  P increases.  This  is  the  case 

where  the  underlying  process  does  not  lend  itself  to  a small 

order  regression  model  in  which  case  an  approximate  model  is 

2 

chosen  on  the  basis  of  trade-off  between  the  decrease  in  dp 
and  additional  segmentation  cost  and  complexity  due  to  the 
increase  in  the  number  of  model  coefficients.  As  an  example 
of  d varies  as  in  Figure  2 as  a function  of  P,  then  the  value 

A 

P could  be  taken  as  the  best  order. 

• • 

2 

d 


Figure  2. 
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3.2.  Properties  of  the  Modeling  Technique 

The  properties  are: 

1.  If  the  underlying  2-D  process  satisfies  a finite  order  auto- 
regressive model,  this  procedure  will  find  that  model.  The 
proof  of  this  property  is  given  in  Appendix  A. 

2.  When  an  approximate  model  of  order  P is  chosen,  the  correla- 
tion generated  by  this  model  matches  the  a priori  correlation 
at,  at  least,  (P+1)  points.  The  proof  of  this  property  is 
given  in  Appendix  B. 

3.  In  deriving  the  model,  only  numerical  values  of  the  correla- 
tion R( x i , *^2 ) are  needed  and  n0  analytic  form  is  required. 
Therefore  in  practice,  Rfr^xg)  can  be  obtained  numerically 
using  training  areas. 

4.  Though,  beyond  the  scope  of  present  considerations,  this 
method  is  believed  to  be  applicable  when  stationary  constraint 
is  removed  and  nonstationary  processes  are  to  be  modeled. 

5.  For  a given  correlation  function,  the  described  procedure  will 
always  generate  a model.  This  model,  however,  may  be  unstable 
hence  unacceptable  for  our  purposes  since  it  cannot  represent 
a homogeneous  process.  Under  these  circumstances,  then,  tests 
must  be  performed  to  Insure  stability  [10]. 
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4.  Scene  Segmentation 

Having  found  an  either  exact  or  approximate  autoregressive  model 
for  each  class,  the  following  describes  how  these  models  are  used  in 
achieving  optimal  segmentation.  The  optimality  criterion  is  derived 
in  Appendix  C and  it  is  evident  that  this  criterion  is  somewhat  dif- 
ferent than  the  familiar  classification  criterion.  This  is  to  be  ex- 
pected since  the  segmentation  process,  by  nature,  not  only  is  a 
classification  process  but  is  a partitioning  process  as  well. 

Development  of  a general  segmentation  method  that  satisfies  all 
the  intrinsic  conditions  of  the  optimality  criterion  of  (C-8)  is 
currently  under  investigation.  In  the  next  part,  however,  a segmenta-^ 
tion  method  is  presented  which  divides  the  image  into  blocks  (a  group 
of  pixels)  and  classifies  each  block  according  to  the  optimality 
principle. 
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4.1.  Segmentation  Procedure 

Let  the  models  associated  with  the  M classes  w-j w^  be  of 

orders  P-j , P^,  respectively  and  let 

(4.1 ) P = max{P-j PMJ 

The  segmentation  of  the  image  is  achieved  by  dividing  the  entire  image 
blocks  of  (P+1 )x(P+l ) in  size  and  classifying  the  individual  blocks  start- 
ing at  the  upper  left  hand  corner  and  in  the  row  by  row  fashion.  Let 
designate  the  block  in  row  i and  column  j.  Within  each  block  let 

the  intensities  of  the  image  be  y(k,t),  k=l , , P and  sl  = 1 , 

P and  finally  let  the  pixels  in  B.  • be  rearranged  in  the  vector  y . . as 

* J * J 

follows: 

(4.2)  y.j  = (y(l»l).y(1.2), y(l  ,P) , y(2,l ) y(P,P)} 

(y(i,i) y(P,P)}  e Bi j 

A given  block  B--  is  considered  to  be  a starting  block  if  the  three 

I J 

blocks  B.  •,  B.  • i and  B.  , . , either  do  not  exist  (i.e.,  is  on 
the  uppermost  or  the  left  hand  most  part  of  the  image)  or  these  blocks 
do  exist  but  they  all  have  not  been  classified  into  the  same  class  (i.e. 
if  Bj  j_-]  e w2  and  B^_-j  j e w^,  for  example).  With  this  definition, 
then,  the  segmentation  process  will  be  totally  defined  by  describing 
how  a starting  and  a non-starting  block  are  classified. 

Assuming  equal  a priori  probability  of  occurence  of  each  class,  w-j , 

V 


• • • • 9 
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(4.3)  P(w1)  = P(w2)  =....=  P(wM) 

a starting  block  B—  is  classified  to  the  class  wk  if 

(4.4)  p(yij  lwk)  - p(yi  j I * = 1 M 

Since 


(4.5) 

P(yijlV  = (2Tr)N/2  | cf)  |1/2  EXP^-"1/2(yij  " 1 ^yij  " 


where 


2 

N = (P+1)  , v is  an  N x 1 vector  whose 
X 

elements  are  the  mean  value  of  class  w and  <j>  is  the  covariance  matrix 

X*  X, 

of  the  vector  y..  as  defined  in  (4.2).  Note  that  for  each  class  w , the 

I J Xr 

matrix  $ is  determined  from  the  a priori  class  statistics  in  (2.1), 
hence  |<|>  | and  ^ are  computed  only  once  for  each  class. 

Substituting  (4.5)  in  (4.4)  and  taking  natural  logarithm  and  sim- 
plifying both  sides  yields  the  following  rule  for  classifying  a starting 
block  B- • : 

i 0 


B.  • 
1J 


e W. 


if 


l^kl  + ^ij'-^-k^  ^k  ^yij_Jik^ 

* + (yij  “ ^"1(yij  " V 


(4.6) 
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for  all  £ = 1 , M 

Now  if  B . • is  not  a starting  block  this  means  that  B,  B.  • 

1 J * ” * 5 J * 5 J ■ 

and  B-  i • n have  all  been  already  classified  into  the  same  class,  say 
i - 1 » J- • 

w,  . The  block  B-.  is  also  classified  in  the  class  w,,  if 

K 1J 

(4.7)  P^ij  l^i , j-1  ’ ^i-1 , j ’ ^i-1  ,j-l  ’ wk^  — ^^ij^W£^ 

for  al 1 £ = 1 , . . . . , M 
£ f k 

Otherwise,  B.j  is  classified  in  class  wp  where 

(4.8)  P(yijl"n)  i p(yijlwt)  ■ 

for  all  £ = 1 ,....,  M 
n,£  f k 

In  other  words,  if  (4.7)  is  not  satisfied,  then  B..  is  determined  not  to 

* J 

belong  to  w^,  and  is  treated  as  a astarting  block  for  any  other  class 
except  w^. 

The  right  hand  side  of  (4.7)  and  both  sides  of  (4.8)  are  evaluated 
using  (4.5).  The  left  hand  side  of  (4.7),  however,  is  to  be  evaluated 
using  the  autoregressive  model  of  the  class  w^.  Let  the  zero  mean  model 
of  this  class  be 


pk  pk 


x(i,j)  = “mn  x(i-mj-n)  + U(i,j) 


m=0  n=0 
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which  indicates  that  each  element,  of  the  vector  y.^  in  (4.2)  satisfies 

I J 


Pk  Pk 


(4.9)  y(r,q)  - v>k  = 2Z  Y “mn  | yC^-m.q-n)  - uk 

m=0  n=0 
m + n i 0 


+ U(r,q) 


where  r and  q now  refer  to  the  actual  location  on  the  two-dimensional 
grid  in  the  image.  For  each  element  of  y . . corresponding  to  location 

I J 

(r,q)  on  the  image  let 

y(r-m,q-n)  - uk 

m=0  n=0 
m + n f 0 


(4.10)  • . y(r.q)  ■ E E “rak„ 


Substituting  (4.10)  in  (4.9)  results  in 

(4.11)  y(r,q)  - yk  - y(r,q)  = U(r,q) 

But  since  U(r,q)  are  a set  of  independent  variables,  the  left  hand  side 
of  (4.7)  is  equivalent  to 


(4'12)  P p(yijlyi,j-l’  yi-l,j’  yi-l,j-r  V 


P(U(r1,q,))  p(U(r2,q2)) 


where  again  (r  , q ) is  the  location  of  the  sr  ' element  of  y. . on  the 

X>  I J 

image.  Substituting  (4.11)  in  (4.12)  yields 


(4.13)  P = 


1 


/o  \N/2  N 
(2tt)  ak 


EXP 


E (y(rr 


q.)  - y(v  q£)  - f* 


k jt=i 


where 


ak2  = E U2(r,q) 

N = (P+1)2 

As  before,  for  the  sake  of  comparison  in  (4.7)  the  quantity 


N 

(4.14)  P'=:N  1n(CTk2)+_T  " 

°k  £=1 

is  used  in  the  actual  implementation. 


no 


4.2.  Optimality  of  the  Segmentation  Procedure 

In  order  to  discuss  the  optimal  characteristics  of  the  procedure 
of  4.1  it  must  be  pointed  out  that  the  procedure  as  presented  takes  a 
group  of  pixels  (a  block)  and  classifies  them  (it)  into  a given  class. 
Hence  on  the  pixel  by  pixel  basis,  the  procedure  cannot  be  optimal 
since  a class  boundary  can  be  such  that  it  goes  through  a given  block 
while  the  procedure,  as  it  stands  now,  will  classify  all  the  pixels  in 
that  block  into  a particular  class.  However,  ignoring  the  misclassi- 
fication  of  the  pixels  around  the  boundaries  and  viewing  the  image  in 
a block  form,  the  question  remains  as  to  whether  the  blocks  are 
classified  optimally  or  not. 

At  this  stage,  however,  instead  of  considering  the  overall  opti- 
mality of  the  procedure  let  us  consider  implications  of  the  optimality 
rule  when  a non-starting  block  is  processed  and  classified.  The  reason 
for  this  limited  analysis,  at  this  time,  is  the  author's  belief  that 
it  is  this  part  of  the  process  that  shed's  the  most  light  in  the  de- 
velopment of  future  optimal  segmentation  techniques.  So  let  be  an 
arbitrary  non-starting  block  and  let  us  assume  that  the  segmentation 
achieved  up  to  B-.  has  been  optimal.  Let  B be  the  set  of  all  the  blocks 

* J 

previous  to  B^j  (in  the  operational  scheme  of  the  last  section)  that 
has  already  been  optimally  segmented.  For  the  sake  of  notational  ease, 
and  without  loss  of  generality,  let  us  further  assume  that  B is 
classifiedinto  a particular  class  wp.  So 

(4-15)  P(B|wp)  > p(B]  |wg)  ....  p(Bjwh) 


for  all  a,  b=l M and  all  subsets  Bm  of  B.  Now  if  the  procedure 

classified  B • • into  w as  well,  then  from  (4.7) 

I J P 


(4.16)  p(BijlBi,j-l’  B1-l,j»  Bi-l,j-r  V -p(Bijlwk} 


for  all  k f p 


But  due  to  the  Markov  property  of  the  process  in  class  Wp 


(4.17)  p(BijlBi,j-l’  Bi-i,j»  Bi-l,j-l’  V " P(Bi j lB»wp) 


Substituting  (4.17)  in  (4.16)  and  multiplying  both  sides  by  (4.15) 
results  in 

(4.18)  p(Bi j |B ,wp)p(B | wp)  ^ P(B-,  |w=) 

But 

(4.19)  P(Bi j |B,wp)p(B |wp)  = p(Bij,B|wp) 
hence  (4.18)  becomes 

(4.20)  p(B,B1j|wp)  > p(B1|wg)  ....  P(BJl|wb)p(Bij  |wk) 

for  all  a,  b and  k f p and  all  subsets  Bm-  Thus  (4.20)  shows  that  when 
(4.7)  is  satisfied  then  the  segmentation  remains  optimal. 

Similarly  it  can  be  shown  that  if  (4.7)  is  not  satisfied,  the  seg- 
mentation will  remain' optimal . 
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Appendix  A 

4*  h 

Let  the  zero  mean  2-D  Gaussian  process  x(-,*)  satisfy  a p n order 
autoregressive  model  of  the  form 

P P 

(A.  1 ) x(i,j)  = ak£  x(i-k»j-0  + U(i,j)  0 • 

k=0  a=0 
k + % = 0 

E U(i,j)  =0 
E U2(i,j)  = a2 

then  x(-,*)  is  a Markov  process  having  the  property 

(A. 2)  p[x(i  ,j)  | x( i jj-1) , x{i-P,j-P) x( i-P-m, j-P-m)]  = 

p[x(i,j)|x(i,j-l),  x(i-P.j-P)] 

for  any  m _>  0.  From  (A. 2),  then  we  have 


( A . 3 ) E x(i ,j) | x( i ,j-l) , ....  x(i-P.j-P), x(i-P-m, j-P-m)  = 

E x ( i , j ) | x ( i s j-1 ) , .....  x(i-P,  j-P) 


But  from  (A.l) 

(A. 4)  E x ( i , j) | x ( i ,j-l ) , 


P P 

x(i-P.j-P)  = J]  E “kn  x(i-k,j-£) 

k=0  a= 0 
k + J^O 
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Nov;  suppose  for  an  order  P+m  the  modeling  procedure  of  Section  3 finds 
the  model : 

P+m  P+m 

(A. 5)  x(i,j)  = Xi  X)  Sk£  x(i-k,j-£)  + U' (i,j) 

k=0  $=0 
k + sl  f 0 

E U' (i ,j)  = 0 
E U,2(i,j)  = o'2 

However  the  minimum  variance  criterion  of  (3.9)  necessitates  that 

; 

(A. 6)  E x(i,j)  [ x ( i ,j-l ) x(i-P,j-P) x(i-P-m, j-P-m) 

P+m  P+m 

= S X)  x(i-k,j-£) 

k=0  jt=0 
k + a f 0 

Finally  comparison  of  (A. 4)  and(A.6)  with  condition  (A. 3)  necessitates 
that  coefficients  Bk£  have  values: 


(A. 7)  B = (^.forkiP,  t<P 

^ (0.'  otherwise 

Substitution  of  (A. 7)  in  (3.13)  will  result  in 
(A. 8)  o’2  = a2 
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hence  proving  the  lemma  that  if  the  underlying  stationary  and  Gaussian 
2-D  process  can  be  modeled  by  a finite  order  autoregressive  model,  then 
the  modeling  procedure  of  Section  3.1  will  result  in  that  model. 
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Appendix  B 

i.  L. 

Let  the  Ptn  order  model  obtained  from  the  modeling  procedure  be 

P P 

(B.  1 ) x(i»o)=X2  a[a  x(i-k»j-£)  + u(i»J') 

k=0  £=0 
k + i f 0 

E U(iJ)  =0 
E U2(i,j)  = a2 

Let  vectors  Z and  W be  defined  as 

(B.2)  W - (Rgi  Rq2  ••••  R-j q R-| 2 ••••  Rpp  Rqq) 

7T  _ , 2vT 

L 'aQ]  a02  **##  a]Q  al2  ' * ' ’ app  ^ * 

2 

Thus  the  first  (P+1)  -1  elements  of  Z are  the  same  as  the  elements  of 
the  vector  a in  (3.11 ) . This  allows  us  to  combine  (3.11)  and  (3.13) 
and  state  that  the  model  parameters  are  found  by  solving  a (P+1)  system 
of  linear  equations  of  the  form 

(B. 3)  At  Z = Qi 

2 2 2 
where  A-j  now  is  a (P+1)  x (P+1)  matrix  and  vectors  Z and  Q-|  are  (P+1) 

x 1 size  vectors.  But  the  elements  of  A-|  and  are  elements  of  the 

vector  W,  hence  the  set  of  equations  in  (B.3)  is  also  a linear  set  of 

equations  in  Rg-| , Rq2,  R^,  R^>  Rpp  and  Rqq.  Thus  (B.3) 

can  be  rearranged  to  an  equavlent  form 


(B.4) 


a2  U = q2 
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where  now  the  elements  of  and  Q2  are  the  various  elements  of  the 

vector  Z or  namely  the  model  parameters. 

2 

Now  suppose  the  first  (P+1)  correlations  that  are  generated  by 
the  model  in  (B.l)  are  CQ-| , Cg2,  •••  etc.  and  let 

(B.5)  W - (Cq-j  Cq2  ....  C-jg  C-J2  ••••  Cpp  Cgg) 

Since  x(-,*)  is  zero  mean  and  stationary,  the  correlation  values  Cg^ , 

Cg2 , etc.  must  satisfy  (3.10)  and  (3.13).  This  system  of  linear 

equations  has  the  form 

(B . 6 ) A2  W = Q2 

Finally  comparison  of  (B.4)  and  (B.6)  yields 

W = W 

and  thus  the  proof  of  the  stated  property. 
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Appendix  C 

Optimal  Segmentation  Criterion 

For  the  sake  of  notational  simplicity,  the  following  discusson 
and  derivations  are  presented  in  a one-dimensional  setting.  However, 
each  step  and  the  result  hold  true  for  two-dimensional  signals  as  well. 
In  an  M class  environment  W-j , let 

(C.l)  x_  {Xi»x2,....,x^} 

be  a set  of  observed  data.  The  segmentation  problem,  then,  is  the 

process  of  partitioning  x into  disjoint  subsets  x-j , and 

assigning  each  subset  to  one  of  the  classes  W-j , (one  or  more 

of  the  subsets  can  be  empty).  In  accordance  with  Baye's  criteria  of 
optimality,  namely  minimization  of  average  loss,  the  average  loss  se 
incurred  by  partitioning  x into  two  subsets  x-|  and  x2  and  assigning  x^ 
to  class  wk  and  x2  to  class  w£  is 

(C. 2)  i?=  L{(wk,  w£),  (xrx2)} 

MM 

= E C^wk’  w£.)  I (wi  »wj)]  P[(wi  »wj)  I (x-,  »x2)] 
i=l  j=l 

where  C[(wk,wfc)  | (w^  ,Wj)]  is  the  cost  associated  with  assigning  x-j , x2  to 
the  classes  wk,  w^  while  in  fact  they  belong  to  classes  w^ , w^.,  re- 
spectively. Assuming  a symmetric  cost  function  for  C of  the  form 

C[(wk,w*)l(wi,wj)]  =1  - 5 ( k-i ,£-j) 


(C.  3) 


where 


s(k-i  ,2,-j) 


1 if  k=i  and  £=j 
0 otherwise 


and  substituting  (C.3)  in  (C.2)  results  in 


(C.4)  MM 

^ = E E P[(wi,wj)|(x1,x2)]  - 
i=l  j=l 

M M 

EE  6(k-i,j-Ji)  P[w.,wj)|(x1,x2)] 
i=l  j=l 

= 1 - P[(wk,w£)|(x,x2)] 

P[(x1,x2)'|(wk,wi)]  P(wk,w£) 

" 1 ‘ ‘ P(x1,x2)  _ 


But  by  definition 

(C.  5)  P(xrx2)  = P(x) 

P(x1 |wk)p(x2|w£)  if  wk  f 
p(xrx2lwk)  if  wk  = 

and  assuming  independent  class  occurences 


P[(xrx2)|(wk,w£)] 


(C.6) 


p<v*v> 


P(wk)P(w£)  if  wk  f w£ 
P(wk)  ifWk'Wt 


So  for  a given  partition  x-j , x2  x>  the  classification  x.|  e and  w2 
e is  optimal  if 

(C.7)  P[(x1,x2)|(wk,wA)]  P(wk,  w z)  > 

P[(xrx2)I(Wi,w.)]  P^W-.Wj) 

for  all  i , j = 1 M 

where  the  densities  on  both  sides  satisfy  (C.5)  and  (C.6). 

The  discussion,  so  far,  has  been  based  on  what  the  optimal  rule 
will  be  if  one  is  given  a two  segment  partition  x-j  and  x2  of  the  set  x. 
However,  (C.4)  holds  true  for  all  possible  two  segment  partitions  of 

x denoted  by  (x-|',x2'),  (x1  ,x2  ) where  P is  the  total  number  of 

possible  of  such  partitions.  Hence  a particular  two-segment  segmenta- 
tion of  x (partitioning  and  classification)  of  the  form  x^  e wk  and 
x2^  e W£  ciptimail  if 

(C.8)  P[(x,q,x2q)|(wk,w(l)]P(wk,wJl)  > 

P[(x1m,x2B)|(w1.wj)]-p(wi,vij) 

for  all  i , j = 1 M 

and  m = 1 , , P 

where,  again,  P is  the  total  number  of  possible  two-segment  partitions 
on  x.  Finally,  (C.2)  through  (C.8)  can  be  expanded  to  include  three  or 
four  or  in  general  s-segment  partitions  on  x. 
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ABSTRACT 

This  paper  concerns  parametric  mixture  models  appropriate  for  data 
presented  in  homogeneous  blocks  of  varying  sizes  from  several  unidentified 
source  populations.  For  most  applications,  the  data  elements  within  each 
block  are  dependent.  Models  are  proposed  for  multivariate  normal  data 
incorporating  two  types  of  dependence,  exchangeability  of  elements  within 
blocks,  and  a Markov  structure  for  blocks.  The  consequences  of  assuming 
exchangeability,  when  in  fact  the  Markov  structure  holds,  are  explored. 
Computational  problems  for  each  model  are  considered,  and  results  of  a 
simple  test  of  the  exchangeability  hypothesis  for  LANDSAT  data  are  pre- 


sented. 
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Introduction 

The  mixture  density  estimation  problem  considered  in  this  paper  may 
be  described  as  follows.  A sample  of  N independent  observations 
0N  is  given,  each  observation  0^  consisting  of  a positive  integer  n^ 
(block  size)  and  a p x n.  matrix 


whose  columns  X..  e are  the  basic  experimental  measurements.  Each 

■ J 

observation  0..  comes  from  one  of  k populations  n^,...,  n^,  where  k 
is  known  but  the  population  of  origin  of  each  observation  is  unknown.  Let 


q^  > 0 denote  the  probability  that  an  observation  comes  from  n^. 

Although  the  data  blocks  X.  are  independent,  the  basic  measurements 
X. . within  each  block  are  possibly  dependent.  For  applications  in  remote 

1 J 

sensing  of  agricultural  resources,  the  parameters  of  primary  interest  are 
and  E[n^|M],  the  mean  block  size  for  the  Jith  population,  where  each 
block  is  a set  of  multi  spectral  measurements  from  a single  agricultural 
field  belonging  to  a single  crop  class  11^.  The  product  q.ECn^|n^]  is 
related  to  the  acreage  in  the  sampling  region  covered  by  the  class  n^. 

The  procedures  suggested  herein  are  automatic  procedures  capable  of  handling 
large  sample  sizes  N as  well  as  large  dimensionality  p,  with  human 
intervention  restricted  mainly  to  a posterior  description  of  classes.  It 
should  be  possible  to  modify  these  procedures,  along  the  lines  indicated 
by  Walker  [11],  to  provide  for  the  inclusion  of  a relatively  small  number 
of  labelled  samples,  whose  class  origins  are  known,  and  perhaps  to  improve 
upon  the  estimates  of  the  parameters  derived  from  the  labelled  samples  at 
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a relatively  small  additional  cost. 

Let  the  observations  be  generically  denoted  by  0 = (n,  X)  and  let 
f(n,  x 1 n^)  be  the  density  function  of  0,  given  that  0 comes  from 
n£.  Let  f(x  | n,  n^)  be  the  density  function  of  X,  given  n and  given 
that  0 comes  from  n0,  and  let  f(n  | nj  be  the  density  of  n given 
population  jl.  The  mixture  density  for  0 is 

X# 

k 

(1.1)  f(n,  x)  - Z q5f(n,  x | nj 

5,  = 1 * 

k 

= Z q£f(n  | n£)f(x  | n,  n£). 

and  the  log  likelihood  for  the  sample  is 

N k 

(1.2)  L = Z log  Z q0f(n.  | n.)f(x_.  |'n,,  nj. 

i = 1 i = 1 * 1 ^ 1 * 

We  shall  assume  particular  parametric  forms  for  f(n  | n0)  and  f(x  | 
n,  n£)  which  are  simple  enough  that  they  are  estimable  from  (1.2).  In 
particular,  we  shall  consider  multivariate  normal  forms  for  f(x  | n,  n.) 

Xj 

which  incorporate  either  exchangeability  of  observations  within  blocks 
or  a first  order  autoregressive  covariance  structure.  The  consequences 
of  the  exchangeability  hypothesis  are  presented  in  some  detail,  and  the 
possibility  of  approximating  the  autoregressive  form  by  exchangeability 
is  considered.  Finally,  we  present  the  results  of  a simple  test  of  ex- 
changeability for  LANDSAT  data. 
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Two  Covariance  Hypotheses 

Throughout  the  remainder  of  this  paper  it  will  be  assumed  that 
f(x  | n,  n^)  is  a pxn-variate  normal  density  function.  To  simplify 
notation,  let  Y = (YjJ...  |Yn)  be  a random  p x n matrix  having  density 
f(x  | n,  Ilj,).  We  assume  that  the  column  process  Y^,...,  Yn  of  Y is 
stationary  with  unknown  mean  and  covariance  function  rn^(h)  = 

cov(Yj,  Yj+^).  Next  to  independence,  the  simplest  assumption  about 
r^h)  is  the  exchangeability  hypothesis  that  Y and  YW  have  the  same 
distribution  for  each  n x n permutation  matrix  W (to  denote  this  we 
write  Y ^ YW).  In  terms  of  the  exchangeability  hypothesis  can 

be  formally  expressed  as 


E 


rrU<h) 


TU 


*W  + ^nS, 


if  h * 0 


if  h = 0 


for  some  (unspecified)  symmetric  p x p matrices  \pn  and  satis- 
fying the  conditions  that  and  ipn^  + nln^  are  positive  definite. 

Experiments  in  image  texture  generation  C9i  and  studies  of  spatial 
correlation  in  LANDSAT  images  [ 4 H suggest  that  the  correlation  of  data 
elements  as  a function  of  spatial  separation  might  be  modeled  as  an  auto- 
regressive process  of  low  order.  Accordingly,  as  an  alternative  to  (e)t 
we  are  led  to  consider  the  hypothesis  ( m ) that  r^Ch)  has  a first  order 
autoregressive,  or  Markov,  structure. 


M 


Vh>  = 


7 I 


1 
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for  some  unspecified  positive  definite  p x p matrix  fl  and  symme- 
tric p x n matrix  A with  spectral  radius  less  than  one. 

The  theorems  stated  below  exhibit  some  consequences  of  the  exchange- 
ability hypothesis  which  are  of  importance  in  computation  and  in  testing 

T 

the  hypothesis.  Jn  denotes  the  vector  (1,  1,...,  1 ) ixn » while  In  de- 
notes the  n x n identity  matrix.  denotes  the  group  of  n x n ortho- 
gonal matrices  W such  that  WJ  = J . 

n n 

Theorem  1:  If  Y is  a normally  distributed  p x n matrix  whose  distri- 

bution satisfies  ( e ) then  YW  ^ Y for  each  member  of  A^.  If  P is 
an  n x (n  - 1)  matrix  satisfying  P^P  = I_^  and  p"*"jn  = 0,  then  Z = 

YP  has  columns  Z^,...,  Z ^ which  are  independently  distributed  as 

, n n 

N (0,  ipn0).  The  statistics  Y = Z Y.  and  S = I (Y.  - Y)(Y.  - Y) 

r n | _ 1 1 j - 1 1 * 

are  independent,  Y is  normal  Np(unJi,  ZnJi  + and  S has  the 

Wishart  distribution  Wp(n-1, 

As  a corollary  of  Theorem  1,  if  n > p + 2 and  (e)  is  true,  then 
the  distribution  of 

f . rL^J.  ZT  ^ x 2 Zj  z]  ) Zj 

is  central  F„  This  observation  is  used  as  a simple  test  of  ( e ) 

p,  n-p-2  r 

described  in  a later  section.  It  is  interesting  to  note  that  the  distri- 
bution of  F does  not  depend  essentially  on  the  normality  of  Y.  Using 
results  of  A.P.  Dawid  [53  it  can  be  shown  that  if  Y is  any  random 

n " 1 j 

p x n matrix  such  that  YW  5 Y for  each  W e a1 , and  z 1.1.  is 

d nn  j _ 2 i J 
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11 

r(h)  = fi^A  Let  f(y)  be  a normal  density  satisfying  ( e ) with 

A 

column  mean  y and  covariance  function 

f Z h * 0 


A . V 

r(h)  = 


The  degree  to  which 


A a , 

E + a h = 0 , 

approximates  f is  measured  by  the  relative  entropy 


H(f,  f) 


/ f(y)log  dy  . 

«Pn  Hy) 


The  relationship  between  this  criterion  and  the  distance,  which  might 
be  considered  more  meaningful,  is  not  very  clear.  The  sharpest  relationship 
we  have  been  able  to  find  is  given  in  the  next  theorem.  A corollary  of  the 

theorem  is  that  if  H(f .,  f)  ->  0 then  f |f . - f|  ->  0,  a result  proved 

¥n. 

by  Geman  [ 8 ] . 


A m 

Theorem  3:  Let  f and  f be  arbitrary  density  functions  on  IR  . For 

each  c > 0, 


j-  f |f(y)  - f(y)|dy  < e + . - log(  1 + e ) H(?’  f)  • 

IRm 

It  is  straightforward  to  show  that  if  expectations  are  taken  with 
respect  to  the  true  density  f,  then 

(3.1)  E(Y)  = y, 

I I 

cov(T)  = ^Q2BQ2 


9 
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almost  surely  positive  definite,  where  Z is  defined  in  Theorem  1,  then 

F has  the  F„  „ „ 0 distribution.  Therefore  the  test  based  on  F is 
p,  n-p-2 

a distribution  free  test  for  the  invariance  of  the  distribution  of  Y 
under  right  multiplication  by  elements  of  . 

By  writing  out  the  density  of  Y under  ( e ) it  is  easy  to  see  that 
(7,  S)  is  sufficient  for  the  family  of  all  normal  distributions  satisfying 
exchangeability.  Under  very  mild  restrictions  the  sufficiency  of  (Y,  S) 
implies  (e).  Thus,  unless  (e)  holds  for  all  source  populations  n^, 
some  loss  of  estimation  accuracy  in  the  parameters  of  primary  interest 
(q^  and  ECn^  | n^])  in  the  mixture  model  is  to  be  expected  when  the 
data. within  blocks  is  condensed  to  block  means  and  scatters. 

Theorem  2:  Let  f be  a family  of  normal  distributions  of  a p x n matrix 

Y and  suppose  that  some  member  of  f satisfies  (e).  If  (7,  S)  is 
sufficient  for  e,  then  (e)  holds  for  each  member  of  f. 

Approximating  the  Markov  Structure  by  Exchangeability 

Even  if  the  Markov  assumption  is  more  appropriate  for  applications, 
the  computations  involved  in  estimating  the  mixture  parameters  are  very 
much  simpler  if  exchangeability  is  assumed.  In  this  section  we  will  show 
that  approximating  the  Markov  form  by  exchangeability  leads  to  certain 
conclusions  about  the  dependence  on  n of  the  covariance  parameters 

V and  of 

Let  f(y)  be  the  normal  density  of  a p x n matrix  Y whose  columns 
satisfy  the  Markov  assumption  with  mean  p and  convariance  function 
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and  E(S)  = nn  - fi  B si  , 

where  B = (I  - A)_1(I  + A)  - |(I  - A)"2A(I  - An)  . 

A 

The  log-likelihood  for  the  density  f is 

log  f(y)  = - n 2'  1 l°gl$i  ~ ^log|J  + nE| 

-y-tr^S  - \ tr($  + ntrhY  - J)(Y  - p)T 


The  parameters  which  maximize  the  expectation,  with  respect  to  f,  of 
log  f(y)  are 

5 - E(Y) 

£ - W!T3T)e<S>- 


Combining  these  equations  with  equations  (3.1),  and  replacing  £ by 
the  new  parameter  R = $ + nz  = n cov(T)  we  have 


Theorem  4: 


H(t,  f) 


is  minimized  when 


A 

u = u 


n - 1 


n 


1 1 

7 1 

SI  B SI 


l 

n2B  n 


l 

7 


9 
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where  B = (I  - A)_1(I  + A)  - | A (I  - A)“2(I  - An)  . 

Although  it  is  not  obvious,  these  parameters  satisfy  the  required 

. A A 

constraints;  that  is,  \p  and  R are  positive  definite.  As  n -*  °°, 

A A A 1 

R and  ^ tend  to  constants.  This  implies  that  z is  O(-)  for  large 
n.  We  will  make  use  of  this  observation  in  the  next  section. 

A 

The  maximum  value  of  E[log  f(Y)]  is 

- ^logltl  - ir  1 Og I R|  - Sf  , 

A A 

where  \p  and  R are  given  in  Theorem  4. 

For  large  values  of  n this  is  approximately 

- flogM  - j log  | (I  - A) _1  ( I + A)|  - If  . 

Since 

EClog  f (Y) ] = - f 1 og | f2 1 - logjl  - A2|  - If  . 

we  have  the  following  expression,  for  large  values  of  n,  for  the  minimum 
entropy: 

H{f , f)  ~ - % log | I - A2|  . 


Estimating  the  Mixture  Parameters 

The  most  successful  method  for  estimating  the  parameters  in  a mixture 
of  distributions  from  a single  exponential  family  is  maximum  likelihood 
[101.  When  the  component  distributions  of  the  mixture  are  parametrized 
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in  the  right  way,  the  EM  procedure  has  a very  natural  and  easily  imple- 
mented formulation  [10],  [6  ].  For  density  functions  f(x  | n,  n£) 
corresponding  to  the  Markov  assumption  the  likelihood  equations  for  the 
mixture  parameters  are  extremely  complicated,  and  there  is  no  obvious 
alternative  to  using  a standard  optimization  procedure  to  maximize  the 
likelihood  function.  There  are  difficulties  involved  in  obtaining  exact 
maximum  likelihood  estimates  with  a sample  sequence  from  a single  auto- 
regressive series  (see  [ 7,  p.329]  and  [ 1]),  and  it  is  reasonable  to 
think  that  these  problems  will  be  compounded  in  the  mixture  setting  pro- 
posed, resulting  in  multiple  solutions,  slow  convergence,  etc.  In  general, 
the  situation  when  f(x  | n,  n£)  satisfies  the  exchangeability  condition 
is  not  much  better;  however,  the  special  case  wherein  Zp£  = ^Z£  and 
i|>n£  = and  Z£  and  ^ are  independent  of  n,  is  amenable  to  solu- 
tion by  the  EM  procedure.  For  large  values  of  n these  assumptions  are 
consistent  with  the  remarks  at  the  end  of  the  last  section,  if  the  Markov 
assumption  holds  with  parameters  independent  of  n. 

Let  each  f(x  | n,  n0)  have  the  form  (e)  with  mean  y _ = y 

jc  nx,  X/ 

and  covariance  parameters  i^n£=  Zn£  = ^ Z£.  Define  R£  = ^ + z . 

Then  jjj-  R is  the  covariance  matrix  of  the  column-mean  X of  an  observed 
block  of  measurements  given  that  the  observation  comes  from  n£  and  given 
the  block  size  n.  Suppose  the  density  f(n  | n£)  is  from  an  exponential 
family 

f(n  j n£)  = C{A£)h(n)eF(Vt(n)  n = 1,  2,... 
where  the  parameter  X£  is  the  expected  value  of  t(n)  under  f(n  | n£). 
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[ 33.  From  (1.1)  and  (1.2)  the  derivative  of  the  log-likelihood  with 
respect  to  ^ is 


(4.1) 


N q/Oy 
i = 1 f(ni,  X..) 


rcuj 

t + F'x,)t  n.)  . 

C(x  ) 1 1 . 


By  differentiating  the  equation 


Z C(x.)h(n)eF(Vt(n)  = 1 

n * 


with  respect  to  X , one  sees  that 

X/ 


c'UJ 

— = - F' (xo)X 


C(V 


(see  [ 3 ]).  Hence 


3L 


= 0 if  and  only  if 


N f(n. , X,|nff) 

{4*2)  XA  ‘ i l ! ~(niVxi)  t(ni}/  . * x f(n.,  X1) 


n f(nit  xi|n£) 
z 


Similarly,  by  considering  -f—  , one  sees  that  for  a maximum  of 


aq 


we  must  have 


(4.3) 


y<ni-  xiiy 

f(y  ^IV 


Now  let  X.  and  be  the  mean  and  scatter  of  the  columns  of  X^ . 


Then 
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^-f(ni,xi|njl)  = f(ni,xi|nA) 


l 


ni  _1  _ 
I n-l/v 


f(ni  * XilV  ~ MV 


‘ J~Rl  (Xi  ' V 

f ni  " 1 -1  1-1,  -1 

—Z~h  +2*lSih 


3^f<"i’  W = f(ni»  xi  IV 


- r "I1  + ? Rr'x  1 - V 


(X,  - „/  r;1 


From  these  equations  it  follows  that  the  derivatives  of  L with  respect 
to  y^,  ip^  and  R^  all  vanish  when 

N f(n. , X.|nJ  _ /n  f(n.,  x.|n.) 

(4*4)  = i f xni  f(n.,  X.)  Xi  X I jni  f(ni#  X.)  ’ 


(4.5)  ip0 


" ^"i’VV,  /N„  fn  nf«"i-xilV 
i = i TRT  V V i = i(  1 " 1)_f<v¥ 


(4.6)  R0 


N f(n,,  X.jnJ  _ _ T /n  f(n.,X.|nJ 

if  ff'CnT/  X.)  ni(Xi  ' V(Xi  " V/ . f 1~fTnT7T7) 


The  iterative  procedure  suggested  by  equations  (4.2)-(4.6),  namely, 
evaluating  the  right  hand  sides  with  the  estimates  Ap^,  qpM  Vp^  » 
xfip ^ , Rp ^ at  the  jth^  step,  to  obtain  the  estimates  qp+V  yp+V 
^P+1^,  R^+V  at  the  (j+l)st  step,  can  be  shown  to  be  a slightly 
modified  EM  procedure  (see  [10J»  and  c 6 3 ) • 
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Testing  the  Exchangeability  Hypothesis 


Standard  testing  procedures  for  the  two  covariance  hypotheses  con- 
sidered would  require  large  block  sizes  n^  and  a large  sample  of  obser- 
vations segregated  as  to  block  size  and  type.  The  remarks  at  the  end  of 
the  second  section  concerning  the  distribution  of  the  statistic  F under 
the  hypothesis  (e)  suggest  a test  which  is  much  easier  to  implement. 

For  the  ith^  block  of  measurements  X^,  let  1-  - (Z^| ...  |Z.j  n _^)  = 
X^P^,  where  is  a n.  x (n^  - 1)  matrix  satisfying  the  conditions 
given  in  Theorem  1.  Let 


Fi 


ni  - 1 T 
E 1.  .Z.  • ) 
j = 2 1J 


If  ( e ) holds  for  all  classes  then  each  F.  is  distributed  as  F „ „ 0 

i p,  n^-p-2 

Thus  the  number  of  observed  blocks  for  which  F^  falls  in  some  given 
quantile  range  of  its  distribution  can  be  tabulated  and  compared  to  its 
expected  value.  Table  1 shows  these  comparisons  for  216  quasi-fields 
of  LANDSAT  agricultural  data  from  LACIE  segment  1645  and  57  quasi -fields 
from  LACIE  segment  1633.  The  quasi-fields  are  those  found  by  an  automatic 

image  segmentation  program  (AMOEBA)  and  may  not  be  representative  of  real 

2 

agricultural  fields.  The  given  x goodness  of  fit  statistics  are  sig- 
nificant at  levels  between  10%  and  20%.  The  hypothesis  (e)  appears  to 
be  rather  weakly  discontinued  for  this  data. 
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TABLE  1 - Disbribution  of  F-Ratios 


Segment  1645  - 216  Fields 


Percentiles 

0 - 5% 

5 - 10% 

10  - 90% 

90  - 95% 

95  - 100% 

Number 

18 

14 

163 

9 

12 

Frequency 

8.2% 

6.5% 

75.5% 

4.2% 

5.6% 

X2  = 6.72 


Segment  1633  - 57  Fields 


Percentiles 

0-5% 

5 - 10% 

10  - 90% 

90  - 95% 

95  - 100% 

Number 

6 

1 

44 

4 

2 

Frequency 

10.5% 

1.3% 

77.7% 

7.0% 

3.5% 

X2  = 5.45 
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Appendix 

Proofs  of  the  Theorems 

Proof  of  Theorem  1:  The  covariance  of  Y can  be  written  as  tj>  8 1^  + 

En£  8 JnJ^  , where  8 denotes  the  kronecker  product.  For  W e , 

YW  = Ip  8 WT(Y)  has  covariance  (Ip  8 WT)(^  8 IR  + Zn£  8 JnjJ)(Ip  8 W) 

= V ® ln  + 8 JA  • The  mean  of  YW  is  yn£JIW  = un A * Therefore’ 

YW  ^ Y.  By  a similar  argument,  if  PTJn  = 0,  PTP  = In_1  and  Z = YP, 

then  E(Z)  = 0 and  cov(Z)  = (Ip  8 PT)(^  8 Ip  + 8 JnJ^)(Ip  8 P)  = 

^nJl  8 ln~l‘  Therefore  the  columns  of  Z are  independently  distributed 

as  Np(0,  ^nA).  To  prove  the  last  assertion  let 

" ' (n'ljn  I P>n  x n 

where  P has  the  same  properties  as  above.  In  block  form,  the  covariance 
of  YQ  = (Y  | Z)  is 


n ^n&+  Sn£ 

0 

0 

8 ‘n-l 

Therefore,  Y and  Z are  independent  and  Y ~ Np(un  , -j~  + En£). 

Moreover,  S = ZZT  and  by  the  first  part  of  the  theorem  S ~ Wp(n-1,  i^). 

Proof  of  Theorem  2:  Let  fQ  be  a density  function  in  f satisfying  the 

hypothesis  (e) . Define 


hf(y)  = f(y)  / f0(y) 
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for  f e f.  By  a version  of  the  Neyman-Fisher  theorem  (Theorem  6.1  of 
[2]),  if  (7,  S)  is  sufficient, 

hf(y)  = gf(y,  S) 

almost  everywhere,  where  is  a Borel  measureable  function  on  the  space 
of  (7,  S).  For  a given  f e f and  W e , the  set 

u = {y  | hf(y)  * hf (yW) > 

is  an  open  set  contained  in  B^  u B2,  where 

Bx  = {y  | hf(y)  * gf(y,  S)}  , 

and 

B2  = B^1  = {y  j hf(yW)  * gf(y,  S)}  . 

By  Theorem  1,  the  pr.  measure  \Q  corresponding  to  f is  invariant 
under  A'  . Since  A„(B.)  = 0 if  follows  that  A„(B0)  = 0 also,  and 
hence,  X Q(u)  = 0.  Therefore  u is  empty  and  hf  is  an  invariant  func- 
tion. This  implies  that  each  f e f is  invariant  under  A'  and  must 
satisfy  (f). 

Proof  of  Theorem  3:  The  function 

" e - log(l  + e) 

A 

f 

is  positive  and  strictly  decreasing  on  (0,  °°).  Thus,  if  j - 1 > e 


we  have 
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J - 1 < g(e)[  J - 1 - log  £ ]. 

f(f  - f) 

f>f 

■ J (7-1)f  + » / ( 

n . f , . _ f 1 . , 


:f-lS£ 

f-1 

+ gU)  f 

A A 

cf  - 1 - log  £]  f 

IRm 

+ gU)  f 

f log(£) 

J 

IRm 

f 

= e + g(e)H(f,  f)  . 


Therefore, 


IRm 


-h|-h> 
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ABSTRACT 

Multi-channel  Landsat  data  are  collected  in  several  passes  over 
agricultural  areas  during  the  growing  season.  This  paper  considers 
how  empirical  Bayes  modeling  can  be  used  to  develop  crop  identifica- 
tion and  discrimination  techniques  that  account  for  spatial  correla- 
tion in  such  data.  Our  approach  models  the  unobservable  parameters 
and  the  data  separately,  hoping  to  take  advantage  of  the  fact  that  the 
bulk  of  spatial  correlation  lies  in  the  parameter  process.  The  pro- 
blem is  then  framed  in.  terms  of  estimating  posterior  probabilities 
of  crop  types  for  each  spatial  area.  Some  empirical  Bayes  spatial 
estimation  methods  developed  earlier  for  this  project  are  used  to 
estimate  the  logits  of  these  probabilities. 


1.  Introduction 


Multi-channel  satellite  image  data,  collected  by  Landsat,are 
recorded  as  a multivariate  (four  dimensional,  for  four  channels)  time 
series  (multiple  passovers  - five  to  seven  times,  spanning  a several 
month  long  growing  season)  in  two  spatial  dimensions.  These  data 
are  part  of  the  "fundamental  research  data  base"  described  in  an 
appendix  to  Guseman  (1983),  each  file  covering  a 30  square  nautical 
mile  agricultural  site  divided  into  22,932  pixels  (picture  segments, 
which  are  the  measurement  units).  Also  available  for  each  site  is 
"ground  truth",  being  discrete  (categorical)  parameters  indicating 
crop  or  ground  cover  type.  Continuous  parameters  might, additionally, 
need  to  be  estimated,  but  only  discrete  parameters  are  considered  in 
this  paper. 

Figure  1 illustrates  the  set-up,  centering  on  pixel  i.  There, 

Y.j  might  most  generally  be  the  20  = 4 x 5 dimensional  vector  consis- 
ting of  responses  for  four  channels  and  five  acquisition  times.  Here 
we  often  will  assume  that  this  dimension  is  reduced,  perhaps  by  using 
Badhwar  transformations  (Badhwar,  1982)  or  a linear  summary  of  the 
data.  Thus  Yi  may  be  univariate  or  multivariate.  Pixel  i has  coor- 
dinates xi  = (x^ , xi2)',  and  ground  truth  parameter  e. . These  para- 
meters label  crop  types,  which,  of  course,  are  highly  correlated  with 
those  in  nearby  pixels  due  to  spatial  continuity  of  crop  types. 


Our  goal  is  to  estimate  the  probabilities  of  each  crop  type  for 
each  pixel,  using  the  data’ {Y.},  incorporating  the  spatial  information. 
That  is,  we  must  determine  for  each  i = 1,  ...»  n (n  = number  of 
pixels),  the  probabilities 

(1.1)  P(ei  = m, | data) , m = 1,  2,  ...,  M = no.  crop  types. 

Having  the  classification  probabilities  (1.1)  permits  construction  of 
a "probability  map"  of  crop  types.  This  formulation  also  handles 
"split  pixels"  naturally,  interpreting  probabilities  as  fractions  of 
each  crop  type  in  a pixel. 

One  can  use  the  probability  map  to  answer  many  questions.  The 
fraction  of  a crop  type  may  be  obtained  for  any  specified  region  by 
summing  probabilities  for  the  relevant  pixels.  Field  boundaries  may 
be  determined  as  occurring  when  classification  probabilities  change 
abruptly.  Thus  we  concentrate  on  the  classification  probabilities, 
by  pixel.  Of  course,  the  spatial  methods  developed  here  are  applica- 
ble to  groups  of  pixels,  as  well  as  to  pixels  themselves,  and  the  best 
grouping  size  must  be  considered.  For  simplicity  of  exposition, 
however,  the  remainder  of  the  discussion  will  be  framed  in  terms  of 
pixels. 

Formula  (1.1)  suggests  a Bayesian  type  of  calculation.  We  shall 
consider  Bayesian  and  empirical  Bayesian  (EB)  approaches  to  this 
problem. 
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2.  Empirical  Bayes  Modeling 

Empirical  Bayes  models,  Morris  (1983b),  involve  two  stochastic 
processes:  one  for  the  parameters  e,  and  one  for  the  data  Y.  In 
general,  we  assume  that 

(2.1)  Y = ' lY-j > has  density  f(yje)  if  the  true 

values  are  0 = {0^} 

(2.2)  0 =’  {0^}  has  density  ir(0),  with  tt  e n, 

a class  of  "priors." 

We  call  this  an  empirical  Bayes  (EB)  statistical  problem.  It  is  a 
parametric  empirical  Bayes  (PEB)  problem  if  n =”{*  on  0 = parameter 
set:  a e G},  G a parameter  set  describing  the  prior. 

The  marginal  distribution 

(2.3)  h(.y|a)  = f f (y 1 8) tt ( 0 } a)d0 

G 

provides  a basis  for  estimating  a e G,  and  for  estimating  Bayes  rules, 
e.g.,  for  estimating  the  Bayes  estimator 

(2.4)  ea=E[0|Y,a]. 

In  Landsat  applications,  however,  the  parameters  0 will  correspond  to 
crop  labels,  and  thus  it  is  more  meaningful  to  replace  (2.4)  by  (2.5) 
and  estimate  the  posterior  probabilities: 
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(2.5)  PY(a)  = P(e  = m|Y,o). 

Note  that  because  a (index  of  the  stochastic  process  determining  e) 
is  unknown,  (2.5)  is  a quantity  requiring  estimation. 

Empirical  Bayes  theory  assumes  that  the  prior  distributions  (2.2) 
exist,  but  t r e n is  not  known  (n  will  be  highly  restricted  relative  to 
all  priors  on  e,  however).  This  differs  from  the  Bayes  approach  in 
that  the  data  are  used  to  estimate  the  prior.  Methods  that  result 
from  this  approach,  however,  also  often  have  good  frequency  operating 
characteristics,  e.g.,  James-Stein  (1962),  Efron-Morris  (1973,  1975). 
Spatial  applications  suggest  that  the  prior  densities  ir  incorporate 
dependence  between  the  parameters  6^ . 

The  most  developed  examples  of  (2.1),  (2.2)  include  Y^  | e^^Nfy.V), 
V known,  and  |a^dN(z!e,A),  a = (3,  A),  A >_  0,  3 e RP,  z . a known 
vector.  In  spatial  applications,  z.  will  depend  on  x^.  The  estimate 
of  the  mean 

(2.6)  Q.  = (l-B)Y.  + B .(z'B) 

A A 

with  B and  3 estimated  from  the  marginal  distribution  of  Y is  an 
empirical  Bayes  version  of  Stein's  estimator,  which  has  been  proved 
superior  to  the  estimator  Y^ . Generalizations  and  other  applications 
of  this  theory  are  reviewed  in  Morris  (1983b)  and  Efron-Morris  (1975). 

Empirical  Bayes  applications  to  spatial  problems  have  been  par- 
ticularly plentiful  . Examples  cited  in  (Morris,  1983b)  include: 
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(a)  Revenue  Sharing.  Fay  and  Herriot  (1979)  show  that  estimates 
of  per  capita  census  income  in  "small  areas"  can  be  improved 
by  combining  data  from  neighboring  areas. 

(b)  Insurance.  The  insurance  industry  uses  "credibility" 
(empirical,  Bayes)  methods  to  determine  to  what  extent  risks, 
in  neighboring  territories  should  be  used  to  estimate  risks 
in  a particular  territory. 

(c)  Fire  Alarms.  Carter  and.  Rolph  (1974)  develop  empirical  Bayes 
estimates  for  spatial  data  (alarm  box  locations)  to  deter- 
mine better  estimates  of  false  alarm  rates. 

(d)  Epidemiology.  Efron  and.  Morris  (1975)  show  that  empirical 
Bayes  estimates  of  toxoplamosis  prevalence  improve  substan- 
tially upon  area-specific  estimates  in  El  Salvador. 

(e)  Forestry.  Burk  and  Ek  (1980)  improve  sample  estimates  of 
forestry  volume  for  specific  areas  by  developing  empirical 
Bayes  estimates  that  use  information  from  neighboring  areas. 

In  these  cases  empirical  Bayes  methods  were  demonstrated  to  work 
better  than  standard  methods  in  the  most  convincing  way:  by  showing 
that  had  they  been  used  with  real  data,  that  better  predictions,  and 
decisions,  would  have  resulted.  The  demonstrated  success  of  these 
spatial  empirical  Bayes  applications  encourages  interest  in  developing 
and  extending  such  methodology  for  remotely  sensed  image  spatial  data. 
However,  this  latter  application  is  substantially  more  complex  than 
its  predecessors,  and  therefore  substantial  additional  development 
will  be  required. 


3.  An  Approach  to  Estimating  Spatial  Probabilities 

The  empirical  Bayes  framework  models  parameters  and  observations 
as  being  realizations  of  separate  stochastic  processes.  This  section  . 
considers  these  processes  in  more  depth,  in  the  context  of  Landsat 
data. 

3.1.  The  parameter  process.  The  bulk  of  Landsat  spatial  infor- 
mation is  captured  in  the  parameter  process,  i.e.  in  the  distribution 
of  crop  labels.  Statistical  procedures  that  incorporate  this  infor- 
mation will  perform  better  than  those  that  ignore  it.  In  practice, 
the  parameter  process  is  unobservable.  However,  "ground  truth"  data 
are  available  from  Landsat  experiments,  and  may  be  used  to  construct 
discriminant  prodedures. 

The  ground  truth  discrete  parameter  process  is  very  complicated, 
involving  the  distribution  of  areal  segments  and  the  crop  types  with- 
in them.  Work  on  this  project  by  M.  Naraghi  (on  random  fields),  by 
H.  J.  Newton  (on  spatially  homogeneous  processes),  and  by  H.  P.  Decell, 
Jr.  and  C.  Peters  (on  special  covariance  structures),  is  reported  in 
(Guseman,  1983). 

These  papers  provide  approaches  to  modeling  the  covariance,  or 
autoregressive  structure  required  for  spatial  parameter  processes. 
However,  we  additionally  require  those  to  be  discrete  categorical 
processes,  thereby  introducing  further  modifications. 

The  simplest  labeling  processes  are  those  that  involve  only  two 
labels,  "zero-one  processes".  At  various  initial  stages  in  this 
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research,  it  is  desirable  to  consider  simplified  models,  binary  pro- 
cesses being  one  possible  choice.  Autologistic  models  provide  another 
method  for  modeling  zero-one  data  (Ripley,  1981). 

The  empirical  Bayes  approach  permits  unknown  parameters  to  exist 
in  the  prior  distribution,  requiring  that  their  values  be  estimated 
from  data  available  in  the  actual  application.  Thus,  one  needn't 
completely  specify  the  parameter  process. 

3.2.  The  data  process.  Data  {y^}  are  provided  for  each. pixel, 
with  distributions  dependent  on  the  parameter  values.  Spatial  infor- 
mation in  this  process  is  important  only  if  it  affects  the  conditional 
distribution  of  {y^}  given  { 0 ^ } . Spatial  correlation  induced  in  the 
{y - } values  via  the  'le^}  correlations  alone  is  most  easily  ignored, 
and  therefore  is  a desirable  simplification,  if  the  data  permits. 

If  the  spatial  aspects  of  the  y values  (permitted  to  be  continu- 
ous) can  be  justifiably  ignored,  then  we  may  use  data  for  which  ground 
truth  is  available  to  estimate  the  density  function  of  the  intensity 
measurements  associated  with  crop  type  m: 

(3.1)  fm(y),  m = 1,  2 M. 

These  density  functions  might  be  adequately  estimated  as  sample  pro- 
portions in  certain  cases,  but  more  effective  choices  are  likely  to 
result  from  density  smoothing  procedures,  for  example,  as  discussed 
by  for  Landsat  data  by  D.  W.  Scott  (for  multi-dimensional  data)  and 
E.  Parzen  (univariate  and  multivariate  density  quantile  estimators). 


both  in  (Guseman,  1983).  Also  see  Wahba  (1981). 

Now  consider  the  following  implement!' on  of  these  ideas.  We  will 
use  , the  data  in  pixel  i alone,  to  estimate 

(3.2)  Pi  = P(0i  = l|y) 

among  the  possible  labels  m = 1,  2,  ...»  M.  Note  that,  in  this 
approach,  may  be  multivariate.  Any  time-aspects  of  Landsat  data 
are  ignored,  for  now.  Let  itj,  ...»  ttm  be  the  prior  probabilities  of 
the  M crop  types.  Then  we  may  calculate  (3.2),  using  Bayes'  rule. 


(3.3) 


M 


This  may  be  viewed,  without  essential  loss  of  generality,  as  a 
two-label  parameter  process,  by  collapsing  the  last  M-l  labels  into 
one  "null"  label: 

(3.4)  ttq  = tt2+  ...+ttm, 
and 


(3.5)  f0(y)  --  |»jfj(y)/-I0  . 

Letting 

(3.6)  z,  = log(_il_),  I.  = log(ZIjL) 

1 !-Pi  1 ff0 


we  have,  equivalent  to  (3.3), 
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(3.7) 


z.j  + log 


f i (yi ) 


Thus,  the  familiar  logarithm  of  likelihood  ratio  estimates  the  log- 
odds  (logit)  of  (3.2). 

R.  Heydorn  and  R.  Basu  (on  mixture  models)  in  (Guseman,  1983) 
adopt  a formulation  similar  to  the  preceding.  They  show  how  to  esti- 
mate M and  the  n^,  ...,  values  by  considering  the  f.(y)  to  be  normal 
distributions,  and  hence,  taking  (3.5)  to  be  a mixture  of  normal 
distributions. 

Even  if  the  Heydorn-Basu  distributional  assumptions  must  be  dropped 
in  favor  of  more  complicated  (non-normal,  multivariate,  etc.)  likeli- 
hood functions,  (3.7)  is  an  easily  comprehended  function  and  an  opti- 
mal data  summary.  Thus,  (3.7)  deserves  much  study  in  the  light  of 
real  Landsat  data. 

We  have  thus  far  ignored  the  time  dimension.  The  values  assumed 
for  the -Y.  may  incorporate  this  via  Badhwar  profile. features,  computed 
from  the  "greenness"  time  series.  Alternatively,  the  likelihood  ratio 
criterion  here  may  indicate  other  time-summaries,  induced  by  allowing 
the  yi  to  be  the  matrix  of  time  and  band  dependent  values. 
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4.  A Simple  Discriminant  Example 

The  simple  example  here  uses  the  univariate  logarithm  of  likeli- 
hood ratio  data  z|,  (4.2)  below,  as  appropriate  data  summaries.  We 
then  improve  them,  considered  as  logit  estimates,  by  incorporating 
other  z.  values  from  neighboring  pixels.  In  the  case  of  homoskedastic 

J 

(equal  variances  and  covariances  for  the  groups  --  an  assumption  not  in 
good  agreement  with  Landsat  data)  normal  distributions  and  M=2,  the 
z-  are  simply  Fisher's  discriminant  functions.  They  are  thus  normally 
distributed  and  are  candidates  for  continuous  parameter  empirical  Bayes 
estimation,  as  described  for  the'  {y..}  values  of  section  2. 

For  independent  homoskedastic  normal  measurements 

(4.1)  yi  - N(v  o2), 

where  m is  one  of  two  labels,  0 or  1,  depending  on  which  label  applies 
in  pixel  i,  it  is  easy  to  show  that 

(4.2)  z,  - zf  + S[^£] 

with  z1  = log(Tr1/TTo),  y = (y1  + y0)/2,  a2  = Var(y.),  6 = (yi-y0)/o. 

Given  (4.2),  we  estimate  P-  as 

p.  = exp(z. )/ [1  + exp(zi)]. 


(4.3) 
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Table  1 lists  the  as  the  probability  of  soybean  (8=1)  versus 
an  unassigned  category  (8=0),  taking  it  =^=.'5.  Here  <5  = 1.5,  y = 52 
and  a = 6 are  estimated  from  a small  amount  of  Band  3,  Acquisition  4 
data  from  one  transect  (This  example  is  kept  quite  simple  in  order 
to  illustrate  the  concepts  most  clearly.) 
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Table  1 

Thirteen  pixels,  in  one  west-east  transect,  first  six  unassigned, 
last  six  soybean,  middle  pixel  split,  y-  = Band  3 value  from  Acqui- 

A 

sition  4 (July  1978).  p^  = probability  of  soybean  using  yi  only, 
p*  is  based  on  three  point  smoothing  of  the  y..  values  (y*).  Average 

A 

p*  error  slightly  improves  on  average  of  p^  for  estimating  true 


(average  errors 

are  . 

22 

and  .24). 

p.  estimates 

use  strong  spatial 

information  involving 

prior  knowledge  of 

groups 

of  six 

pixels,  with 

average  error  . 

% . 

03.  See 

text. 

Pixel  i 

True 

e 

A 

fi 

y* 

11 

Pi 

1 

0 

38 

.03 

39.3 

.04 

.00001 

2 

0 

42 

.08 

41.3 

.06 

.00001 

3 

0 

44 

.12 

44.3 

.13 

.00001 

4 

0 

47 

.22 

46.7 

.21 

.00001 

5 

0 

49 

.32 

47.7 

.25 

.00001 

6 

0 

47 

.22 

46.0 

.18 

.00001 

7 

.5 

42 

.08 

44.3 

.13 

.08 

8 

1 

44 

.12 

46.3 

.20 

. .9997 

9 

1 

53 

.56 

52.0 

.50 

.9997 

10 

1 

59 

.85 

58.7 

.84 

.9997 

11 

1 

64 

.95 

61.3 

.91 

.9997 

12 

1 

61 

.90 

63.0 

.94 

.9997 

13 

1 

64 

.95 

63.0 

.94 

.9997 

Average  le^-p^l 

: 

.24 

.22 

.03 
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Stein-type  estimators,  described  later,  would  shrink  the  logit 
values  z^  toward  a smoothed  version  of  the  . Here  we  smooth  by 
using  a three  point  moving  average  zj  involving  the  z - values  in 
the  preceding  and  next  pixel  along  the  transect  as  recorded  in  Table  1 
We  would  ordinarily  expect  to  use  neighboring  pixels  to  the  north 
and  south  too,  but  did  not  do  so  in  this  simple  example  involving  just 
one  transect.  The  probabilities  pt  are  in  average  slightly  closer  to 

A 

the  true  than  are  the  p^ . The  amount  of  shrinkage  toward  zj  is 
estimated  to  be  full  (B  = 1),  in  this  example,  and  thus  p^  is  also 
the  Stein,  or  empirical  Bayes,  estimator.  However,  the  shrinking 
factor  used,  in  Morris  (1983b),  and  discussed  here  in  Section  5, 
assumes  the  y^ , given  the  e ^ , to  be  independent.  In  these  data,  the 
y . appear  to  be  spatially  correlated,  and,  if  so,  shrinking  factors 
accounting  for  this  must  be  developed. 

The  p.  in  Table  1 can  be  improved  enormously  if  one  has  more 
spatial  information.  Suppose,  for  example,  that  we  know  that  the 
last  six  pixels  are'-  the  same:  either  all  are  soybean,  or  none 

are  soybean.  The  z^  values  are  then  should  be  summed  over  the  six 
pixels  before  computing  the  estimate  of  the  soybean  probability.  That 
probability,  called  Pj  in  Table  1,  is  .9997  for  each  of  the  last 

A 

six  pixels.  Compare  this  with  = .12  for  i = 8!  We  also  get 
p.  = .00001  as  the  soybean  probability  in  the  first  six  pixels 

("unassigned").  The  only  non-negl igable  error  is  the  p^.=  .08. 
value  for  the  middle  (split)  pixel. 
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Weaker  forms  of  spatial  information  than  that  just  discussed 
can,  and  should,  be  used.  For  example,  suppose  it  were  known  that 
the  13  pixels  in  Table  1 begin  with  pixels  in  the  unassigned  category, 
and  switch  to  soybeans  after  a random  pixel  position  "I".  Then, 

assuming  equally  likely  probabilities  for  I = 1 12  a priori,  and 

independent  y^  values,  the  posterior  probabilities  of  I are  proportional 
to  the  likelihood 

(4.4)  L ( i ) =n  .1-1 12. 

i+ro'^i > 

Formula  (4.4)  provides  probabilistic  basis  for  estimating  the  change 
point  (areal  boundary),  and  the  probabilities.  Of  course,  more  com- 
plicated models  must  be  considered  in  realistic  situations. 

Other  forms  of  logistic  regression  and  discriminant  analysis 
have  been  proposed  to  deal  with  spatial  correlation  > see,  for  example, 
(Switzer,  1980). 
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5.  Shrinkage  Estimation  Using  Affinity  Matrices 

We  developed  the  notion  of  "affinity  matrices"  in  an  earlier 
report  (Morris,  1983a).  These  n x n matrices,  n = number  of  pixels, 
indicate  the  spatial  affinity  of  pixels.  An  affinity  matrix  A is  a 
stochastic  matrix,  the  rows  of  A being  probability  vectors:  Ae  = e, 

ee  (1,  ....  1)'  being  the  vector  of  units.  Generally  A will  be  a 
sparse  matrix,  only  a few  neighboring  pixels  being  chosen  to  help 
estimate  any  particular  one.  Estimates  z*  like  those  in  (5.1)  below 
are  similar  to  moving  average  estimates. 

The  log-odds  for  pixel  i are  based  on  the  raw  data  y^  for 
that  pixel.  Stein-type  shrinkage  estimators,  used  in  conjunction 
with  affinity  matrices,  and  applied  to  the  z^  values,  can  improve 
the  logit  estimate  z^  by  shrinking  z^  to  a smoothed  value  z*  computed 
as  an  average  of  responses  over  neighboring  values.  That  is, 
letting 

(5.1 ) z*  = A^,  z = (zlt  ...»  zn)', 

A an  affinity  matrix,  then  z*  is  a vector  of  spatially  smoothed  log-odds 
estimates.  We  need  to  choose  between  z.  and  z*,  however.  A Stein-type 
shrinkage  rule  allows  the  data  to  determine  the  degree  to  which  z* 
should  be  used  in  preference  to  z,  by  employing  a shrinking  factor  B 
in 

zi  = (1  - B)zi  + Bz*, 


(5.2) 


161 


with  B calculated  as 


(5.3) 


B 


(k-r-2)V 


The  value  r in  (5.3)  is  chosen  to  account  for  the  use  of  A,  being  the 

trace  of  A if  A is  symmetric  (Morris,  1983a), and  V is  the  common, 

% 

known  variance  of  the  z^,  being  62  in  the  formulation  of  (4.2). 

Minimax  results  with  respect  to  squared  error  loss  for  some  estimators 
of  this  type  are  given  in  (Morris,  1983a). 

For  spatial  data,  which  are  only  locally  homoegenous,  an  estimator 
with  a localized  shrinkage  factor  can  be  expected  to  improve  upon 
estimators  like  those  of  (5.2),  (5.3),  which  use  a single,  global, 
shrinkage  factor.  When  the  shrinkage  factor  is  calculated  separately 
for  each  pixel,  (5.2)  becomes 

(5.4)  z.  = (1  - B.)z.  + B.z*. 

If  A=  (a-j.j)»  then  a choice  of  B..  is,  from  (Kostal,  1983), 


(5.5) 


Bi 


diV 

f ij(zj  - ZV2 

vJ 


Here  d^  is  a suitably  chosen  positive  constant  depending  on  A»  allow- 
ing the  shrinkage  in  pixel  i to  be  determined  by  the  z-  values  for 

J 

pixels  to  which  the  affinity  matrix  assigns  nonzero  weight. 
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6.  Empirical  Bayes  for  Time  Series  Analyses 

Thus  far,  this  paper  ignores  the  time-series  characteristics  of 
the  data,  but  Landsat  data  includes  a time  series'  {y^}  for  each  pixel 
i (typically,  5 times).  For  simplicity,  we  shall  first  consider  the 
time  series  {y^.}  for  a given  pixel. 

A Bayesian  structure  for  time  series  analysis  is  given  by  Harrison 
and  Stevens  (1976).  Their  DLM  (dynamic  linear  model)  consists  of  an 
observation  distribution 

(6.1)  yt|Ft.  V Vt-  N(Ft6t.  Vt). 

with  independent  error  terms.  The  parameter  distribution,  also  with 
independent  error  terms,  is  specified  as 

(6.2)  et|G,et_1,  Wt  - N(Get_1,  Wt). 

The  series  is  initialized  by  specifying 

(6.3) 

The  posterior  distribution  of  ut  given  y^ = (y{,  ...,  y£)'  is 

(6.4)  et|yt  - N(mt,  Ct), 

where  mt  and  C^,  given  recursively  by  the  Kalman  filter,  are  the 
posterior  mean  vector  and  covariance  matrix.  The  posterior  mean  mt 
provides  an  estimate  of  8^..  These  moments  cannot  be  calculated 
unless  all  the  process  parameters  are  known.  If  there  are  unknown 


process  parameters,  such  as  mo  and  Cq  (the  prior  moments),  they  often 
can  be  estimated  using  the  marginal  distribution  of  yt.  These 
estimates  then  are  used  to  estimate  the  posterior  mean  mt  and  thus 

When  several  time  series  {y.^}  follow  (6.1)  - (6.2)  independently 
with  different  initializing  distributions 

(6.5)  0 o i 1 171o  i ’ Coi  " NKi’  Coi  ^ ’ 

empirical  Bayes  methods  lead  to  estimates  of  mol-  with  smaller  mean 
squared  error  than  those  obtained  from  the  marginal  distributions 
of  y^  for  pixel  i alone. 

The  parameters  F^,  V^,  G,  Wt,  mQ  and  Cq  in  (6.1)  - (6.3)  will 
depend  on  the  crop  type  in  the  pixel.  Let  1^  denote  the  model  which 
obtains  when  the  pixel  contains  crop  type  m (m  = 1,  ...,  M).  The 
prior  probability  of  each  model  is  the  prior  probability  of  each 
crop  type  it  = (ttx , ...,  Hi^) 1 . Thus  the  response  density  (3.1), 
f (yt),  is  the  marginal  density  of  y^  for  pixel  i under  model  1^. 

This  density  would  be  used  to  obtain  the  logit  z^ , as  in  (3.6), 

Thus  incorporating  the  time-aspect  of  the  Landsat  data  into  the 
probabilistic  structure  of  Section  3. 
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Abstract 


This  paper  is  concerned  with  the  use  of  spline  functions  in  the 
development  of  classification  algorithms.  In  particular,  a method  is 
formulated  for  producing  spline  approximations  to  univariate  density 
functions  when  each  density  function  is  described  by  a histogram  of 
measurements.  The  resulting  approximations  are  then  incorporated  into  a 
Bayesian  classification  procedure  for  which  the  probability  of  misclassi- 
fi cation  can  be  readily  computed.  Some  preliminary  numerical  results  are 
presented  to  illustrate  the  method. 
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§1.  Introduction. 

This  paper  is  concerned  with  the  use  of  spline  functions  as  a tool  in 
statistical  pattern  classification  algorithms.  In  particular,  we  show  how 
splines  can  be  used  to  estimate  the  conditional  density  functions  for  the 
classes  of  interest  and  to  find  the  associated  classification  regions. 
Moreover,  we  also  show  how  to  compute  the  probability  of  mi sclassifi cation 
associated  with  the  algorithm. 

The  paper  is  divided  into  6 sections.  In  Section  2 we  discuss  the 
general  Bayes  classification  procedure.  In  Section  3 we  present  a method 
for  estimating  densities  based  on  polynomial  splines.  The  problems  of 
computing  the  related  classification  regions  and  the  probability  of 
mi sclassifi cation  are  treated  in  Sections  4 and  5,  respectively.  We 
close  the  paper  with  a discussion  of  examples  and  future  research. 


§2.  The  Bayes  Classi fi cation  Procedure. 

Let  it x and  tt2  be  distinct  classes  of  interest  with  known  a-priori 
probabilities  04  and  a2,  respectively.  Let  X : m U tt2  R be  a random 

variable,  where  X(w)  = x is  the  measurement  in  R taken  from  an  element  w 
of  rr L u ir2  . Suppose  that  the  measurements  of  elements  from  each  of 
and  it 2 are  characterized  by  density  functions  px  and  p2,  respectively. 

Then  the  Bayes  optimal  classifier  is  defined  as  follows: 
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Assign  an  element  w to  tt -j  if  its  measurement  x = X(w)  belongs 
to  Rj , i = 1,2,  where  Rj  and  R2  are  the  Bayes  Decision 
Regions  defined  by 

(2.1)  Ri  = lx  <=  R:  «iPi(x)  2“2P2(x)} 

R2  = R ~ R j. . 

The  numerical  implementation  of  this  classification  procedure  re- 
quires the  determination  of  the  sets  Rx  and  R2,  which  in  turn  amounts  to 
finding  the  roots  of  the  equation  axpx(x)  - a2p2(x)  = 0 . 

Associated  with  this  classification  scheme,  we  define  the  probability 
of  misclassification  (cf.  [1,2])  by 

G = 1 - JR  max[a1p1(x),a2p2(x)]dx 

(2.2) 

= ai  JR  Pi(x)dx  + a2  JR  p2(x)dx  . 

In  general,  the  evaluation  of  G is  a difficult  numerical  problem, 
even  when  pi  and  p2  are  known  density  functions.  One  case  where  G can  be 
computed  exactly  (along  with  the  Bayes  decision  regions  Rx  and  R2)  is  the 
case  where  pi  and  p2  are  known  or  estimated  univariate  normal  density 
functions  (cf.  [12]  and  [13]). 

In  most  practical  problems,  the  densities  px  and  p2  will  not  be 
known,  and  an  essential  first  step  in  performing  Bayesian  classification 
is  to  compute  reasonable  estimates  of  these  densities.  This  is  a 
classical  problem  in  statistical  analysis.  One  of  the  standard 
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nonparametric  approaches  to  this  problem  is  to  approximate  pL  and  p2  by 
fitting  histograms  constructed  from  measurements  taken  from  elements  of 
the  corresponding  classes.  We  discuss  this  fitting  problem  in  the 
following  section. 

§3.  Estimating  Densities  Using  Splines. 

In  this  section  we  discuss  the  problem  of  fitting  a spline  function 
to  a histogram.  We  begin  with  some  notation.  Suppose  that  t^  < < 

...  < tfl+i  and  hi,...,hfj  are  given  real  numbers.  These  numbers 
describe  a histogram  function  h:  R -*•  R,  defined  by 

( h-j  , if  x € [ti  jt-j+i)  , 1 < i < N 

(3.1)  h(x)  = “ “ 

( 0 , otherwise. 

The  values  t{,  1 i <_  N+l,  describe  the  edges  of  the  bins  of  the 
histogram,  while  the  values  h-j , 1 _<  i <_  N,  describe  the  height  of  each 
bin  (cf.  Figure  1). 

Several  techniques  have  been  developed  for  approximating  histograms 
using  spline  functions.  In  what  appears  to  be  the  first  paper  on  the 
subject,  Bedau  [3]  constructs  the  natural  spline  s which  interpolates  the 
histogram  in  the  sense  that  s(x-j ) = hj  , i = 1,...,N,  where  x-j  = 

(ti  + ti +i)/2  are  the  centers  of  the  bins.  Later  Boneva,  Kendall,  & 
Stefanov  [7]  and  Schoenberg  [17]  analyzed  the  problem  of  finding  a spline 
s (the  integral  of  a natural  spline)  which  fits  the  histogram  in  the  sense 


that 
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t.  . 

/t  1+1  s(t)dt  = h.(t.+1-ti)  , i = 1,...,N. 

This  condition  assures  that  the  area  under  the  spline  between  each  pair  of 
points  ti  and  ti+i  exactly  matches  the  area  in  the  corresponding  bin 
of  the  histogram.  These  authors  referred  to  their  approximations  as 
histosplines.  Schoenberg  [17]  also  considered  fitting  histograms  using 
smoothing  natural  splines  (and  referred  to  the  resulting  fits  as 
splinograms).  But  as  observed  later  by  the  above  authors  and  others  (cf. 
[8]),  a major  drawback  of  methods  based  on  natural  splines  is  the  tendency 
of  the  fitting  spline  to  dip  below  the  axis  near  the  ends  of  its  support 
set. 

Another  approach  to  fitting  a histogram  h(x)  using  splines  is  to 
attempt  to  construct  an  approximating  s(x)  as  a linear  combination  of 
B-splines.  To  discuss  B-splines,  we  now  introduce  further  notation. 
Suppose  that  y^  < y2  < ...  < yn+fI1  is  a set  of  real  numbers.  Then 
associated  with  these  points  there  is  a set  of  B-splines 
Bi(x) ,... ,Bn(x)  with  the  properties: 

Bi(x)  is  a piecewise  polynomial  of  order  m with  join  points  (knots) 
located  at  the  points  y^  .....y-j^; 

Bj(x)  has  m-2  continuous  derivatives  on  R; 

Bi(x)  is  positive  on  (y-j.yi-Hn)  and  vanishes  elsewhere; 

B-j(x)  can  be  computed  efficiently  and  accurately. 

An  example  of  quadratic  B-splines  (m=3)  defined  for  equally-spaced  knots 
is  presented  in  Figure  2. 
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B-splines  possess  a variety  of  other  important  properties  which  make 
them  ideal  for  approximation  purposes,  (cf.  the  books  [9,18]).  In 
particular,  linear  combinations  of  the  form 

(3.2)  s(x)  = z c.  B.(x) 

j=l  J J 

are  easy  to  manipulate  on  a digital  computer.  The  use  of  B-spline  series 
of  this  form  also  has  the  advantage  that  s has  support  on  the  interval 
[yi,yn+m]»  and  if  we  choose  all  the  coefficients  to  satisfy  the 
constraint 

(3.3)  c.  > 0 , j = l,...,n, 

J 

then  s will  also  be  a nonnegative  function. 

The  first  author  to  use  B-spline  series  as  in  (3.2)  to  fit  densities 
appears  to  be  Marsaglia  [15].  His  approach  was  to  find  coefficients 
cj,...,cn  to  maximize  c^  + ...  + cn  subject  to  (3.3)  and  the 
constraint  that  s(x)  p(x),  all  x € R.  This  can  be  recast  as  a linear 
programming  program.  Although  Masaglia  obtained  reasonably  good  results 
with  this  technique  for  smooth  functions  p,  when  applied  to  histogram 
functions  h it  tends  to  produce  a spline  s which  lies  substantially  under 
the  histogram. 

Another  approach  to  constructing  a spline  s of  the  form  (3.2)  fitting 
a histogram  h as  in  (3.1)  is  to  choose  'Cj_,...,cn  to  minimize  in  some 
sense  the  vector  e = [ej_,...  ,e^]  with  e^  = s(xi)  - h-j , and,  as 
before,  x-j  = (ti  + ti+i)/2,  i = 1,...,N. 
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Bennett  [4,5]  considered  the  cases  where  the  quantity  to  be  minimized  is 
either  the  or  £«  norm  of  the  vector  e.  Both  of  these  problems 
(subject  to  the  constraint  (3.3))  can  be  cast  as  linear  programming 
problems. 

Since  we  are  working  with  histograms  as  approximations  to  a density 
function,  it  seems  to  us  that  it  is  important  to  match  areas  (cf.  the 
above  discussion  of  the  methods  of  spli nograms  and  histosplines).  Thus  we 
propose  the  following  alternative  to  the  above  spline  methods:  Find 

ci,...,cn  satisfying  the  constraint  (3.3)  such  that  the  resulting 
spline  minimizes  the  expression 

(3.4)  us  - h»1  = J_”|s(x)  - h(x)|dx  . 

This  problem  can  be  recast  as: 

(3.5)  minimize  e^  + + ,,,  + 


over  c.  2 0.  1 < j < n and  e.  >.  0 , 1 _<  i N,  subject  to  the  constraints 

J * 

n 

(3.6)  -e.  _<  Z c.  1.^  - A.  _<  e.  , i = 1,...,N 

*3  1 


where  = ^(t^-t.)  is  the  area  of  the  i-th  bin,  and 


(3.7) 


B (t)dt 

J 


i “ 1 , . . . , N and  j = l,.«.,n» 


Problem  (3.5)  is  easily  translated  into  a standard  linear  programming 
program  which  can  be  solved  using  readily  available  packages.  The  numbers 
I i j in  (3.7)  can  be  computed  easily  by  well-known  B-spline  algorithms 
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(cf.  p.  200  of  [18]).  The  application  of  this  method  to  a practical 
problem  requires  the  selection  of  the  order  m of  the  spline  as  well  as  the 
number  and  location  of  the  knots.  In  general,  we  recommend  that  m be 
taken  to  be  2,3  or  4 which  leads  to  linear,  quadratic,  and  cubic  splines, 
respectively. 

The  selection  of  the  knots  is  a more  difficult  problem.  So  far  our 
numerical  tests  have  been  conducted  with  visual  selection  of  the  knots. 

Our  experience  suggests  that  it  is  reasonable  to  select  the  first  and  last 
knots  at  tj  - w and  tpj+i  + w,  where  w is  the  average  bin  width.  A 
reasonable  choice  for  the  remaining  knots  is  to  place  one  at  the  center  of 
each  bin  for  odd  orders,  and  at  the  bin  edges  for  even  orders.  If 
additional  knots  are  desired,  they  should  be  added  in  regions  where  the 
histogram  has  rapid  changes  in  height.  It  is  even  possible  to  insert 
multiple  knots  (where  a given  knot  location  is  selected  two  or  more 
times).  Multiple  knots  reduce  the  smoothness  of  the  spline  while  adding 
to  its  flexibility.  For  a given  order  m,  it  is  clear  that  the  difference 
between  the  spline  s and  the  histogram  h measured  in  the  1^-norm  decreases 
as  we  add  more  and  more  knots. 

§4.  Finding  the  Bayes  Decision  Regions. 

Suppose  now  that  we  are  attempting  to  build  a Bayes  classifier 
corresponding  to  two  classes  as  in  Section  1,  and  that  we  have 
approximations  sx  and  s2  to  the  corresponding  densities  Pi  and  p2.  We 
now  address  the  problem  of  finding  the  Bayes  decision  regions 

A A A 

Rj.  = {x  e R : aiSiCx)  > a2s2(x)  } and  R2  = R~  Rx  . 
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As  noted  in  Section  1,  this  problem  is  equivalent  to  finding  the 
zeros  of  the  function 

(4.1)  r(x)  = ociSi  (x)  - a2s2(x)  . 

If  Si  and  s2  are  both  splines  of  the  same  order  m based  on  the  same  set  of 

knots.,  then  r is  also  a spline  of  the  same  type,  and  our  problem  is 
reduced  to  locating  its  zeros.  In  general,  however,  we  may  choose  Sj  and 
s2  to  be  splines  of  different  orders  (say  mi  and  m2)  and  based  on 
different  knot  sequences  and  a2  . In  this  case  the  following 
observation  is  important. 

THEOREM:  If  are  splines  of  order  mj  corresponding  to  knot 

sequences  A-j , i = 1,2,  then  the  function  r defined  in  (4.1)  is  a spline 

of  order  m = maxfmi.mg)  with  knots  A = A^  u 

Proof:  It  is  clear  that  both  s^  and  S2  are  piecewise  polynomials  of 
order  m between  the  knots  of  A,  and  it  follows  that  r is  also.  The  fact 
that  r has  m-2  continuous  derivatives  on  R is  easily  checked.  □ 

In  order  to  translate  this  theorem  into  a useful  algorithm  for 
finding  the  zeros  of  r,  it  is  desirable  to  rewrite  both  Si  and  s2  as 
B-spline  expansions  in  terms  of  B-splines  of  order  m defined  on  the  knot 
sequence  A.  Fortunately,  there  are  stable  algorithms  for  converting  a 
B-spline  expansion  of  given  degree  with  given  knots  to  an  equivalent 
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B-spline  expansion  of  another  degree  with  a larger  set  of  knots,  (cf. 

[6,11]).  There  is  no  need  to  examine  these  algorithms  in  detail  here;  we 
have  programmed  them  for  our  classification  package. 

After  writing  Sx  and  s2  as  linear  combinations  of  a common  set  of 
B-splines,  the  problem  of  finding  the  zeros  of  the  function  r defined  in 

(4.1)  reduces  to  the  problem  of  finding  the  zeros  of  a given  B-spline 
expansion.  This  problem  can  be  attacked  by  converting  the  B-spline 
expansion  to  a piecewise  polynomial  representation  and  then  finding  the 
zeros  of  each  polynomial  piece.  However,  more  robust  and  efficient 
methods  for  finding  zeros  of  splines  are  being  developed  (cf.[14]). 

§5.  Computing  the  Probability  of  Misclassification. 

Suppose  again  that  Sx  and  s2  are  spline  approximations  to  the 
densities  Px  and  p2,  and  suppose  that  we  have  found  the  associated  Bayes 

A A 

decision  regions  Ri  and  R2.  Then  it  is  clear  that  an  approximation  to  the 
probability  of  misclassification  G associated  with  the  densities  px  and  p2 
is  given  by  the  expression 

A 

(5.1)  G = C4  J*  Si(x)dx  + cx2  s2(x)dx. 

1 2 

Since  both  Sx  and  s2  are  B-spline  series  and  the  sets  Rx  and  R2  are 
unions  of  intervals,  to  compute  G we  need  to  be  able  to  integrate  a given 
B-spline  series  over  any  given  finite  interval  [a,b].  But  there  exist 
standard,  highly  efficient  and  accurate  algorithms  for  just  this  purpose 
(cf.  p.  200  of  [18]).  We  have  implemented  such  a package  and  (up  to 
roundoff)  it  produces  the  values  of  G exactly. 
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§6.  Discussion. 

The  spline  classification  algorithm  outlined  in  this  paper  has  been 
implemented  as  a FORTRAN  package.  The  package  consists  of  a set  of 
subroutines  which  performs  density  fits  for  given  histograms,  finds  the 
classification  regions,  and  computes  the  associated  probability  of 
misclassification.  In  addition,  the  package  includes  various  subroutines 
for  evaluating,  integrating,  graphing,  and  finding  the  zeros  of  B-spline 
series.  A FORTRAN  implementation  of  an  algorithm  of  Ravindran  [16]  is 
used  to  solve  the  linear  programming  problem  (3.5)  - (3.7). 

Some  preliminary  fits  to  the  histogram  given  in  Figure  1 were  made 
using  quadratic  and  cubic  8-splines.  In  Figures  3 and  4 we  present  the 
fits  obtained  using  quadratic  B-splines  with  different  interior  knot 
selections  and  multiple  knots  at  the  endpoints.  Figures  5 and  6 present 
the  fits  obtained  using  cubic  B-splines  with  interior  knots  at  the  bin 
centers  and  multiple  knots  at  different  left  endpoints.  An  additional 
knot  was  inserted  (at  0.0)  for  the  fit  presented  in  Figure  7. 

Using  the  results  of  the  quadratic  B-spline  fit  (Figure  4)  to  the 
original  histogram  and  its  translate  (by  4 units)  we  determined  the  Bayes 


decision  regions  Rx  and  R2  and,  assuming  equal  a priori  probabilities, 

A 

computed  the  resulting  value  of  G.  These  results  appear  in  Figure  8. 

In  this  paper  we  have  concentrated  on  the  classification  problem  for 
two  classes.  It  is  clear  that  most  of  what  we  have  said  carries  over  to 
the  case  of  three  or  more  classes.  In  particular,  the  histograms  for  each 
class  can  be  fit  with  splines  in  the  same  way  as  described  here.  To  find 
the  classification  regions  now  will  require  pairwise  comparison  of  the 
spline  fits  to  the  densities.  The  probability  of  misclassification  can 
then  be  found  as  before. 


179 


This  paper  has  dealt  only  with  univariate  classification.  We  intend 
to  apply  similar  techniques  to  the  multivariate  case.  In  particular,  we 
intend  to  fit  multivariate  histogram  functions  using  either  tensor-product 
splines  or  multivariate  B-splines  defined  on  triangulations.  In  either 
case  we  expect  to  be  able  to  accurately  find  the  classification  regions 
and  to  compute  the  probability  of  misclassification. 
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Figure  1:  Original  Histogram 
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Figure  3:  Quadratic  B-Spl.ine  Fit-Knots  at  Bin  Centers 
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Figure  5:  Cubic  B-spline  Fit  Over  [-.125,  3.375] 
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Cubic  B-spline  Fit  Over  [-.25,  3.375] 
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QUADRATIC  B-SPLINE  FITS' TO 
ORIGINAL  HISTOGRAM  AND  ITS 
TRANSLATE 
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ABSTRACT 

Quantile  data  analysis  and  functional  statistical  inference  methods 
are  introduced  and  applied  to  provide  representations  of  spectral  data 
which  may  lead  to  simple  statistical  discriminators  effective  for  the 
estimation  of  ground  truth  from  satelite  spectral  measurements. 

To  estimate  the  ground  truth  of  a pixel,  we  propose  to  estimate  the 
probability  of  each  possible  ground  truth,  given  observed  (estimated) 
quantile-theoretic  statistical  characteristics  of  the  multi-spectral 
image  data  corresponding  to  the  pixel  and  its  neighboring  pixels.  This 
paper  describes  a research  strategy  for  determining  which  statistical 
Characteristics  discriminate  best. 

Results  are  reported  of  quantile  data  analysis  of  an  extensive 
collection  of  training  files  of  image  data. 
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1 . Introduction 

To  conduct  research  in  image  analysis,  one  must  define  its  data, 
ends,  and  means. 

The  data  consists  of  files.  An  image  file  consists  of  measure- 
ments taken  on  a specified  date  at  a specified  5x6  nautical  mile 
site  on  the  earth's  surface.  A site  is  divided  into  a rectangular 
grid  of  (more  than  20,000)  surface  elements  [approximately  1 acre] 
called  pixels.  On  each  pixel,  spectral  measurements  are  made  by 
satelite  on  four  (and  perhaps  seven)  channel s [of  the  electromagnetic 
energy  spectrum].  Each  spectral  measurement  is  an  integer  from  0 to 
256. 

The  ends  [goals]  of  image  analysis  is  to  estimate  ground  truth 
within  the  pixel;  labels  for  ground  truth  include  alfalfa,  corn, 
soybeans,  sugar  beets,  spring  wheat,  spring  oats,  grass,  pasture, 
trees,  water. 

A file  is  called  a training  file  if  a ground  truth  record  is 
available;  each  pixel  is  divided  into  six  sub-pixels  and  ground  truth 
is  recorded  for  each  sub-pixel. 

The  means  of  image  analysis  are  currently  under  investigation  by 
many  investigators.  A probability  approach  considers  ground  truth  as 
a parameter  [denoted  e],  A formal  Bayesian  statistical  solution  to 
the  estimation  of  ground  truth  from  data  is  to  calculate  p(e|data), 
the  posterior  probability  distribution  of  9 [the  ground  truth  parameter] 
given  the  data.  A formal  maximum  likelihood  solution  to  the  estimation 
of  ground  truth  from  data  consists  of  two  steps:  (1)  calculate  the 
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likelihood  function  of  e,  which  equals  p(data|e),  the  conditional 
distribution  of  the  data  given  that  it  is  observed  from  a pixel  with 
ground  truth  e,  and  (2)  use  optimization  algorithms  to  determine  e, 
the  parameter  value  which  maximizes  likelihood.  The  foregoing  formal 
statistical  procedures  are  often  described  as  being  theoretically 
"optimal."  But  they  may  not  be  "good"  in  practice  in  the  sense  of 
correctly  identifying  ground  truth  with  high  probability. 

To  obtain  high  probability  of  discrimination,  we  recommend 
(1)  measuring  suitable  characteristics  of  probability  models  of 
the  data,  (2)  treating  the  measured  characteristics  as  new  data,  and 
estimating  the  likelihood  function  pleasured  characteristics 
of  data l 6 ) , and  (3)  determining  characteristics  whose  distributions  for 
different  values  of  0 are  as  wide  apart  as  possible  [the  likelihood 
function  is  not  flat  and  its  optimum  is  easily  determined]. 

This  paper  investigates  the  use  of  quantile  data  analysis  to 
obtained  measured  characteristics  of  image  data  which  have  good  power 
of  discrimination  between  different  values  of  ground  truth.  Only 
univariate  analysis  methods  are  used  on  channel  2 and  channel  3 spectral 
observations.  Future  research  will  be  concerned  with  bivariate  analysis 
of  the  joint  distribution  of  channel  2 and  channel  3 measurements.  Our 
approach  to  quantile  data  analysis  strongly  recommends  that  bivariate 
analysis  be  built  on  a foundation  of  univariate  analysis.  Therefore 
the  univariate  analysis  techniques  developed  in  this  paper  will  not  be 
rendered  obsolete  by  the  bivariate  techniques  to  be  developed  in  future 


research. 
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2.  Outline  of  Quantile-Data  Analysis  of  a Pixel 

Let  us  describe  a proposed  method  of  statistical  data  analysis 
based  on  characteristics  of  the  sample  quantile  functions  of  batches  of 
measurements.  Given  a pixel  whose  ground  truth  we  would  like  to 
estimate,  let  (t-|,t2)  be  its  coordinates  which  represent  its  position 
within  the  rectangular  grid  of  pixels  into  which  the  scene  has  been 
divided. 

Define  A (t,  ,t9),  the  v-neighborhood  of  a pixel,  to  be  the  set  of 

\)  \ c. 

pixels  with  coordinates  (tj  + j-j , t2  + J2),  where  j-j  ,J2=0,+1 , . . . ,+v. 

For  example  A^(t15t2)  contains  9 pixels,  A2(t.j,t2)  contains  25  pixels, 
A3( tq , t2 ) contains  49  pixels. 

For  k=2  and  3,  the  channel  k measurements  of  the  pixels  in  Ay(tq, 
t2)  are  collected  to  form  a data  batch  whose  sample  quantile  function 
Q(u)  is  formed.  The  "measured  data  characteristics"  we  associate  with 
a pixel  are  various  characteristics  of  the  sample  quantile  function  of 
a batch  of  measurements  formed  from  the  pixels  surrounding  a given 
pixel.  The  remainder  of  this  section  reviews  quantile  data  analysis 
and  defines  the  summary  statistics  that  it  suggests. 

The  probability  law  of  a random  variable  X is  usually  described 
by  its  distribution  function  F(x)=Pr[X^x],  -<°°x«»,  and  probability 
density  function  f(x)=F'(x).  The  quantile  approach  uses  [see,  for 
example,  Parzen  (1983)] 

(!)  Q(u)  = F-1(u)  = inf  {x:F(x)>_u}  , 


(2)  q(u)  = Q'(u)  , 
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(3)  fQ(u)  = f(Q(u) ) = (q(u)}"1,  and 

(4)  J(u)  = -(fQ)'(u)  . 

A quick  measure  of  location  is  the  median  Q(0.5).  A quick  index  of 
scale  is  the  interquartile  range  Q(0. 75)  - Q(0.25),  formed  for  the 
quartiles  Q(0.25)  and  Q(0.75). 

Quick  measures  of  distributional  shape  are  provided  by  values  (as 
u tends  to  0 and  1)  of  the  informative  quantile  function  [recently 
introduced  by  Parzen]. 

. 2{Q(0.75)  - Q(0.25JT  ’ - - * 

We  cannot  emphasize  how  powerful  the  IQ. function  appears  to  be  in 
practice  as  a tool  for  the  diagnosis  of  distributional  shapes. 

The  IQ  function  is  independent  of  location  and  scale  parameters. 

It  is  approximately  equivalent  to  normalizing  a quantile  function  to 
have  the  properties  Q(0.5)  = 0,  Q ' (0.5)  = 1.  The  IQ  graph  of  the 
function  provides  us  at  a glance  with  a vague  estimate  of  tail  behavior 
as  defined  by  tail  exponents. 

A fundamental  description  of  the  tail  behavior  of  distributions 
is  provided  by  the  left  tail  exponent  aQ  and  the  right  tail  exponent 
a-j  defined  as  follows: 

fQ(u)  = uao  Lq(u)  as  u 0 

fQ(u)  = (1-u)  1 L^(u)  as  u ->  1 

where  Lq ( u)  and  L-j(u)  are  slowly  varying  functions. 


A function  L(u)  is  slowly  varying  as  u ->  0 if,  for  every  y > 0, 

lim  L(yu)  _ , . 
in-0  L(u) 

Tail  behavior  is  defined  in  terms  of  a tail  exponent  as  follows: 
a<l : short  tail 

a=l : medium  tail 

a>l:  long  tail 

Medium  tail  (a=l ) distributions  are  further  classified  by  the  value  of 

, _ lim  f(u)  . = lim  f(u) 

■0  iM)  u * nl  U-+1  1-u 

the  letter  h is  suggested  by  the  notion  of  hazard  function.  We  define 

h = 0:  medium-long  tail 

0 < h < medium-medium  tail 
h = medium-short  tail 

Extensive  calculations  of  informative  quantile  functions  indicate 
that  the  value  IQq  of  IQ(u)  for  u near  0 is  a quick  indicator  of 
left  tail  behavior: 

-0.5  £ IQq  <0  : short  left  tail, 

-1.0  <_  IQq  < -0.5:  medium-short  left  tail, 

IQ0  < -1.0:  medium-medium  to  long  left  tail. 

Similarly  the  value  IQ^  of  IQ(u)  for  u near  1 is  a quick  indicator  of 
right  tail  behavior: 

0 < IQ-|  <_  0.5:  short  right  tail, 

0.5  < IQ-|  < 1.0:  medium-short  left  tail, 
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1.0  < IQ-j : medium-medium  to  long  right  tail 

An  important  family  of  distributions  is  the  Wei  bull  with  shape 
parameter  3 . Its  quantile  function  Q(u)  is  of  the  form 

Q(u)  = u + a Q0(u) 

where 

Q0(u)  = j {log  (1-ufV  . 

Its  density-quantile 

f0Q0  (u)  = O-u)  {^°9  (1-urtH  • 

Its  right  tail  exponent  is  a = 1.,  and  its  left  tail  exponent  is 
aQ  =1-3.  insight  into  the  interpretation  of  informative  quantile 
functions  is  obtained  by  computing  them  for  Weibull  distributions. 

Given  data,  we  distinguish  three  types  of  estimators  of  population 
parameters,  which  we  call : (T)  fully  non-parametric,  (2)  fully 
parametric,  and  (3)  functional-parametric.  Fully  non-parametric 
estimators  assume  no  model,  and  provide  quick  estimators.  Fully 
parametric  estimators  assume  a model  known  up  to  a finite  number  of 
parameters  which  must  be  estimated.  Functional -parametric  estimators 
are  based  on  methods  of  functional  statistical  inference. 

A fully  non-parametric  estimator  Q( u)  of  Q(j),  given  a sample  of 
n distinct  values  X^,n  < X2<n<. . .<Xn.n,  is  defined  by  (for  j>l,...,n) 

= Xj;n  ’ ^rT  <u  - n 

For  a large  sample,  or  for  grouped  values,  we  form  a histogram  before 


computing  Q(u)  by  linear  interpolation  at  an  equi-spaced  grid  of  values 
kh,  k=l  ,2, . . . ,[l/h]  where  usually  h = 0.01. 

3.  Example  and  Interpretation  of  a Quantile  Data  Analysis 

To  illustrate  the  uses  of  measured  data  characteristics  provided 
by  quantile  data  analysis,  let  us  consider  the  analysis  of  a training 
file  which  contains  both  image  data  and  ground  truth  data.  We  search 
through  the  ground  truth  file  to  see  what  codes  appear  more  than  a few 
times.  The  codes  found  to  be  present  corresponded  to  the  ground  truth 
values  listed  in  Table  A.  For  a ground  truth  value  j,  we  created  a 
data  batch  consisting  of  all  the  channel  2 values  observed  in  a pixel 
one  of  whose  sub-pixels  had  a ground  truth  equal  to  the  value  j.  We 
created  a similar  data  batch  of  channel  3 observations.  Table  A lists 
the  sample  sizes  of  the  number  of  observations  in  these  data  batches 
and  the  medians  and  interquartile  ranges  of  the  channel  2 and  channel 
3 observations.  One  immediately  sees  a pattern  which  might  provide  a 
discrimination  statistic  A to  be  used  in  determining  ground  truth. 

One  might  be  able  to  readily  distinguish  the  category  "grass,  pasture, 
trees"  from  "corn,  soybeans,  sugar  beets,  spring  wheat,  spring  oats" 
by  the  values  of 

A-|  = median  (channel  3)  - median  (channel  2) 

i IQ  range  (channel  31 
a2  " ® IQ  range  (channel  2) 

The  values  of  these  statistics  are  given  in  Table  A.  Note  that  a-j  > 2 
for  grass,  pasture,  and  trees,  and  A-j  < 2 for  crops.  Of  the  crops. 
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alfalfa  is  closest  in  statistical  characteristics  to  grass,  pasture, 
and  trees;  this  conclusion  is  reached  also  in  Table  B. 

Table  A lists  statistics  based  bn  comparisons  of  location  and 
scale  estimators;  Table  B lists  discriminators  which  are  based  on  shape 
and  tail  characteristics.  We  consider  the  following  four  character- 
istics as  statistics  which  might  discriminate  between  (ground  truth) 
distributions: 


MEAN  IQ  = 


MEAN  - MEDIAN 
2 x'  TNTE'R'QUARTirOTW'fir 


a - cTn  m - STANDARD  DEVIATION 
A4  . ^ ~ 2 x INTERQUARTILE  RANGE 

a5  = IQq  = IQ(u)  for  u near  0 


Ag  = IQ-|  = IQ(u)  for  u near  1 


The  values  of  these  statistics  in  this  example  indicate  that  trees, 
pasture,  and  grass  have  spectral  observations  with  distributions  closer 
to  normal,  while  crops  have  spectral  observations  with  distributions 
further  from  normal . 

It  should  be  strongly  emphasized  that  these  empirical  patterns 
found  in  one  file  are  not  intended  to  be  general  algorithms  applicable 
to  all  files.  They  are  presented  only  as  an  illustration  of  the  kinds 
of  facts  about  image  data  which  quantile  data  analysis  proposes  to 
discover  through  extensive  computation  on  training  files. 


TABLE  A 


Sample 

Median 

Median 

Median 

Size 

Channel  2 

Channel  3 

-Media 

Alfalfa 

377 

19 

20 

1 

Corn 

8,755 

15 

14 

-1 

Soybeans 

11,000 

15 

13 

-2 

Sugar  Beets 

793 

14 

12 

-2 

Spring  Wheat 

2,296 

18 

16 

-2 

Spring  Oats 

558 

18 

16 

-2 

Grass 

■ 174 

23 

26 

3 

Pasture 

-248 

-21 

28 

7 

Trees 

95 

20 

24 

4 

IQ  Range 

IQ  Range 

Ai 

Log  IQ ( 3) 

Channel  2 

Channel  3 

-Log  IQ(2) 

A1 falfa 

9 

16.75 

.62 

Corn 

5 

6.5 

.26 

Soybeans 

5 

8 

.47 

Sugar  Beets 

4 

4.5 

.12 

Spring  Wheat 

6 

9. 

.41 

Spring  Oats 

8 

11 

.32 

Grass 

8 

12.5 

.45 

Pasture 

5 

13 

.96 

Trees 

6 

11 

.61 

202 


TABLE  B 

Mean  IQ  Mean  IQ  STD  IQ  STD  IQ 


Channel  2 

Channel  3 

Channel  2 

Channel  3 

Alfalfa 

-.08 

-.07 

.32 

.27 

Trees 

-.08 

-.01 

.38 

.35 

Pasture 

-.04 

-.06 

.41 

.32 

Grass 

-.01 

.02 

.36 

.34 

Spring  Wheat 

.07 

.11 

.38 

.41 

Spring  Oats 

.09 

.12 

.36 

.35 

Sugar  Beets 

.14 

.06 

.42 

.49 

Corn 

.17 

.10 

. .44 

.51 

Soybeans 

.17 

.13 

.46 

.41 

I(^o 

Channel  2 

% 

Channel  3 

^1 

Channel  2 

Channel  3 

Alfalfa 

-.34 

-.32 

1.05 

.68 

Trees 

-.75 

-.72 

1.0 

.68 

Pasture 

-1.0 

-.76 

1.1 

.65 

Grass 

-.75 

1 

PO 

.81 

.68 

Spring  Wheat 

-.58 

-.44 

2.08 

1.66 

Spring  Oats 

-.43 

-.36 

1.18 

1--0 

Sugar  Beets 

-.37 

-.44 

2.25 

2.55 

Corn 

-.40 

-.46 

2.8 

3.0 

Soybeans 

-.40 

-.31 

2.7 

1.93 

Note:  STD  IQ  = .37  for  normal.  Aboveiline  characteristics  close  to 

normal.  Below  line  characteristics  far  from  normal. 


203 


4.  Quantile  Data  Analysis  of  Statistical 

Characteristics  Estimated  from  Pixel  Nieghborhoods 

A program  of  fundamental  research  on  the  quantile  data  analysis 
approach  to  picture  segmentation  poses  many  detailed  problems  for 
research.  This  section  gives  an  example  of  one  sample  quantile  data 
analysis.  (1)  Consider  all  pixels  in  a file  whose  ground  truth  is  a 
specified  crop  (spring  wheat  is  considered  here).  (2)  For  each  such 
pixel  form  a 5 by  5 neighborhood  of  pixels  (with  the  specified  pixel 
at  the  center).  (3)  For  each  neighborhood  form  a data  batch  of 
spectral  observations  (channels  2 and  3 are  considered  here).  (4)  For 
each  data  batch,  form  its  sample  quantile  function  and  estimate  typical 
univariate  quantile  theoretic  statistical  characteristics:  median,  IQR 
(interquartile  range),  mean  IQ  (mean  of  informative  quantile  function), 
STDIQ  (standard  deviation  of  IQ  function),  IQ(.Ol),  IQ ( . 99 ) , average 
log  spacings  (which  is  a fully  non-parametric  estimator  of  entropy  of 
the  IQ  function),  and  log  aQ  [where  aQ  is  the  score  deviation,  defined 
as  the  sum  of  products  of  the  spacings  of  the  IQ  function  and  a 
specified  density-quantile  function  fQQ0(u)].  The  specified  density 
quantile  functions  that  we  use  are  the  logistic  distribution 

f0Q0(u)  = u(l-ii)  , 

and  the  Weibull  distribution  with  quantile  shape  parameter  g [we 
choose  g = 0.7] 

f0Q0(u)  = 0-u)  {-log  (l-u)}1_e 
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Step  (5)  is  to  form,  for  each  statistical  characteristic,  a 
data  batch  of  the  several  thousand  estimates  of  that  characteristic 
corresponding  to  the  pixels  in  the  training  file  with  the  specified 
ground  truth  [here,  spring  wheat  (code  100)].  Step  (6)  is  to  do  a 
one-sample  quantile  data  analysis  of  this  data  batch.  These  analyses 
are  presented  in  detail  for  the  following  estimators:  median  channel  2, 
mean  IQ  channel  2,  median  channel  3,  mean  IQ  channel  3. 

The  following  table  lists  some  basic  summary  measures  for  a 
one-sample  statistical  analysis  of  a data  batch  of  statistical 
characteristics  of  Spring  Wheat  pixel  neighborhoods: 


Median 
Channel  2 

Median 
Channel  3 

Mean  IQ 
Channel  2 

Mean  IQ 
Channel  3 

Median 

18 

16 

.02 

.04 

IQR 

5 

7.75 

.14  ' 

.13 

Mean  IQ 

.04 

.09 

.03 

.01 

Std  IQ 

.41 

.36 

.44 

.49 

Av.  Log  Spacings 

-.68 

-.59 

• 43 

.43 

a Weibull 
0 

.67 

.61 

.78 

.80 

aQ  Logistic 

.34 

.20 

.22 

.22 

We  give  for  these  variables:  (1)  printer  plots  of  the 
informative  quantile  functions;  (2)  estimated  density  quantile 
functions,  computed  by  the  method  of  autoregressive  density  estimation; 
using  the  logistic  and  Weibull  distributions  as  bases;  and  (3) 
diagnostic  distribution  functions  (to  be  compared  with  the  uniform) 
that  help  us  decide  which  autoregressive  order  to  accept  as  providing 
a "parsimonious"  estimator. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Medians:  IQ  Plot  indicates  not  normal  but 

possibly  Weibull.  To  test  Weibull,  we  do 
not  currently  estimate  shape  parameter  B, 
but  choose  0 = 0.7. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Medians:  Autoregressive  density  quantile 
estimator  (with  logistic  base  and  order  3) 
indicates  bi modal  density. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Medians.  Diagnostic  of  fit  of  AR  density- 
quantile  estimator  (with  logistic  base  and 


order  3) . 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Medians:  Autoregressive  density  quantile 

analysis  (with  Wei  bull  shape  parameter  .7 
base  and  order  2)  indicates  bimodal 
density^ 
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Spring  Wheat  Pixel  Neighborhoods  Channel  2 


Medians : Diagnostic  of  fit  of  AR  density 

quantile  estimator  (with  Wei  bull  shape 


parameter  .7  base  and  order  2). 
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AUTOREGRESSIVE  . PARAMETRIC  SELECT  ANALYSIS 


Mean  IQ  Logistic  Base 

SQUARED  MODULUS  OF  FOURIER  COEFFICIENTS 


PHI2(  1)  = 0.00099923  * 

PHI2(  2)  - 0.00039880  * 

PHI2(  3)  = 0.00049510  * 

PHI 2(  4)  = 0.00095499  * 

PHI2(  5)  = 0.00067322  * 


AUTOREGRESSIVE  PARAMETRIC  SELECT  ANALYSIS 


Mean  IQ  Wei  bull  Case 


SQUARED  MODULUS  OF  FOURIER  COEFFICIENTS 
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* 
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* 

Spring  Wheat  Pixel  Neighborhood  Channel  2 Mean  IQ:  Pseudo-correlations  square 

nodulus  (phi  2)  accept  logistic  distribution,  reject  Wei  bull  distribution  fit. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Mean  IQ:  Cumulative  weighted  spacings 

D(u)  plot  indicates  accept  fit  of  logistic 
distribution. 
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distribution  fit. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Mean  IQ:  Autoregressive  density-quantile 
estimator  (with  logistic  base  and  order  1) 
indicates  normal^! ike  density. 
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Spring  Wheat  Pixel  Neighborhood  Channel  2 
Mean  IQ:  Autoregressive  density-quantile 
estimator  (with  Weibull  base  and  order  1) 
indicates  density  which  is  symmetric  and 
unimodal  but  less  normal-like. 
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AUTOREGRESSIVE  PARAMETRIC  SELECT  ANALYSIS 


Median  Channel  3 Logistic 

SQUARED  MODULUS  OF  FOURIER  COEFFICIENTS 


PHI2(  1) 
PHI2(  2) 
PHI2(  3) 
PHI2(  4) 
PHI2(  5) 


0.03838847 
0.00517420 
0.01 129055 
0.00169468 
0.01800444 


STATISTICAL 


AUTOREGRESSIVE  PARAMETRIC  SELECT  ANALYSIS 


Median  Channel  3 Weibull 

SQUARED  MODULUS  OF  FOURIER  COEFFICIENTS 


PHI2(  1)  = 0.00177358 
PHI2(  2)  = 0.00257716 
PHI2(  3)  = 0.00229294 
PHI2(  4)  = 0.00134046 
PHI2(  5)  = 0.01826305 


Spring  Wheat  Pixel  Neighborhood  Channel  3 Medians:  Pseudo-correlations  square 

modulus  (phi  2)  accept  Weibull  distribution,  reject  logistic  distribution  fit. 
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Spring  Wheat  Pixel  Neighborhood  Channel  3 
Medians:  Cumulative  weighted  spacings 

D(u)  plot  indicates  accept  fit  of  Weibull 
distribution  (shape  parameter  3 =0.7). 
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Spring  Wheat  Pixel  Neighborhood  Channel  3 
Medians:  Cumulative  weighted  spacings 

D(u)  plot  indicates  re.iect  fit  of 


logistic  distribution. 
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SEG* 1380  YR-1978  DY*  115  CH=  3 GT  = 100 
DENSITY-QUANTILE  FUNCTION  WEIBULL  CASE  ORDER  = 1 
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Spring  Wheat  Pixel  Neighborhood  Channel  3 
Medians : Autoregressive  density-quantile 
estimator  (with  Weibull  base  and  order  1) 
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)ring  Wheat  Pixel  Neighborhood  Channel 


Medians:  Autoregressive  density-quantile 

estimator  (with  logistic  base  and  order  1). 
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SUMMARY  OF  AR  PARAMETRIC  SELECT  ANALYSIS  FOR  LOGISTIC  CASE 


Median  Channel  3 


ORDER  RES_VAR  LOG(RES_VAR ) AIC 


CAT 


1 

2 

3 

4 

5 


0.96161 
0.94997 
0.93G83 
0.92880 
0.91 700 


-0.03914 
-0.05133 
-O . 06525 
-0.07387 
-0.08GG5 


-0.01914 

-0.01133 

-0.00525 

0.00613 

0.01335 


-1 .01922 
-1.01100 
-1 .00444 
-0.99229 
-0.98433 


OPTIMAL 

ORDER 

BY 

CAT 

CRITERION 

IS  1 

MAXIMUM 

ORDER 

CHECKED 

IS 

5 

OPTIMAL 

ORDER 

BY 

AIC 

CRITERION 

IS  1 

MAXIMUM 

ORDER 

CHECKED 

IS 

5 

SUMMARY  OF  AR  PARAMETRIC  SELECT  ANALYSIS  FOR  WEIBULL  CASE 


Median  Channel  3 


ORDER 

RES_VAR 

LOG( RES_VAR ) 

1 

0.99823 

-0.00177 

2 

0.99556 

-0.00445 

3 

0.99366 

-0.00636 

4 

0.99218 

-0.00785 

5 

0.97480 

-0.02552 

OPTIMAL 

ORDER 

BY 

CAT 

CRITERION 

IS 

OPTIMAL 

ORDER 

BY 

AIC 

CRITERION 

IS 

AIC  CAT 

0.01823  -0.98184 
0.03555  -0.96461 
0.05364  -0.94667 
0.07215  -0.92837 
0.07448  -0.92561 


0 MAXIMUM  ORDER  CHECKED  IS  5 

0 MAXIMUM  ORDER  CHECKED  IS  5 


Spring  Wheat  Pixel  Neighborhood  Channel  3 Medians:  AIC  AR  order 

determining  analysis. 
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Spring  Wheat  Pixel  Neighborhood  Channel 


Mean  IQ:  IQ  plot  indicates  normality. 
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RAW  DISTRIBUTION  D(U)  LOGISTIC  CASE 

D+  = 0.0999  AT  U = 0.9900.  D-  » -0.3870  AT  U = 
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Spring  Wheat  Pixel  Neighborhood  Channel  3 
Mean  IQ:  Cumulative  weighted  spacings  D(u) 

plot  indicates  accept  fit  of  logistic 
distribution. 
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SEG= 1380  YR= 1978  DY=  115  CH=  3 GT  = 100 
DENSITY-QUANTILE  FUNCTION  LOGISTIC  CASE  ORDER  = 1 

+ 0220220 


0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0 

ABSCISSA  IS  U , ORDINATE  IS  FQ(U) 


Spring  Wheat  Pixel  Neighborhood  Channel  3 
Mean  IQ:  Autoregressive  density-quantile 
estimator  (with  logistic  base  and  order  1) 
indicates  normal-like  density. 


ro 

po 

cn 


S SYSTEM 


) 

23:45  WEDNESDAY,  MAY  18,  1983  180 


Spring  Wheat  Pixel  Neighborhood  Channel  3 


Mean  IQ:  Autoregressive  density-quantile 
estimator  (with  Weibull  base  and  order  2) 
indicates  a density  not  in  accord  with 


logistic  analysis,  thus  casting  doubt  on 
current  reliability  of  AR  order  determining 
techniques. 
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SEG= 1380  YR= 1978  DY=  115  CH=  3 GT  = 100 
DBAR  PLOTTED  AGAINST  D(U)=U  (*)  WEIBULL 
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Spring  Wheat  Pixel  Neighborhood  Channel  3 
Mean  IQ:  Diagnostic  of  fit  of  AR  density- 

quantile  estimator  (with  Weibull  base 
and  order  2)  indicates  that  it  "overfits" 
and  might  generate  spurious  modes  in  the 
density. 
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SUMMARY  OF  AR  PARAMETRIC  SELECT  ANALYSIS  FOR  WEIBULL  CASE 


Mean  IQ  Channel  3 


ORDER 

RES_VAR 

LOG( RES_VAR ) 

AIC 

CAT 

1 

0.90931 

-0.09507 

-0.07507 

-1.07785 

2 

0.88209 

-O. 12546 

-0-08546 

-1.08899 

3 

0.86845 

-0. 14104 

-0.08104 

-1.08376 

4 

0.86084 

-0. 14984 

-0.06984 

-1 .07086 

5 

0.85631 

-0  ..15512 

-0.05512 

-1.05400 

OPTIMAL  ORDER  BY  CAT  CRITERION  IS 
OPTIMAL  ORDER  BY  AIC  CRITERION  IS 


2 MAXIMUM  ORDER  CHECKED  IS  5 

2 MAXIMUM  ORDER  CHECKED  IS  5 


SUMMARY  OF  AR  PARAMETRIC  SELECT  ANALYSIS  FOR  LOGISTIC  CASE 


Mean  IQ  Channel  3 


ORDER 

RES_VAR 

LOG(RES_VAR) 

AIC 

CAT 

1 

0.99882 

-0.001 18 

0.01882 

-0.98126 

2 

0.99655 

-0.00346 

0.03654 

-0.96365 

3 

0.99392 

-0.00610 

0.05390 

-0.94643 

4 

0.99086 

-0.00918 

0.07082 

-0.92966 

5 

0.98756 

-0.01251 

0.08749 

-0.91315 

OPTIMAL  ORDER  BY  CAT  CRITERION  IS  0 MAXIMUM  ORDER  CHECKED  IS  5 

OPTIMAL  ORDER  BY  AIC  CRITERION  IS  0 MAXIMUM  ORDER  CHECKED  IS  5 


Spring  Wheat  Pixel  Neighborhood  Channel  3 Mean  IQ:  AIC  AR  order 

determining  analysis. 
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Appendix:  Quantile  and  FUN.STAT  Data  Analysis 

This  appendix  presents  some  of  the  new  characterizations  of 
probability  laws  which  are  being  developed  under  the  names  of  quantile 
data  analysis,  and  functional  statistical  inference  analysis. 

Estimators  of  these  characteristics  are  currently  available  for  one 
sample  and  two  samples,  univariate  and  bivariate  [Parzen  (1979), 

(1983),  Woodfield  (1982)]. 

These  methods  seem  to  have  much  potential  to  contribute  to  the 
solution  of  the  problem  of  digital  image  representation:  the 
determination  and  modeling  of  basic  characteristics  or  features  of  the 
digital  image  which  can  be  incorporated  into  the  process  of  identifying 
classes  and  attributes  in  a scene.  They  provide  new  approaches  to 
determining  scene  probability  density  functions  and  class  conditional 

density  functions  of  digital  image  data  in  order  to  understand  spectral 
characteristics  and  extract  desired  information.  They  can  provide  data 
representations  which  reduce  the  dimensions  of  multivariate  image  data 
while  preserving  information  pertaining  to  scene  classes  and  attributes. 

A.  One  Sample:  Univariate 

Let  X be  continuous  random  variable  of  which  we  observe  a random 
sample.  To  estimate  distribution  function  F^x)  = Pr[X<x]  and 
probability  density  f(x)  - F'(x),  we  estimate:  quantile  function  Qx(u)= 
F^(u);  quantile  density  q^(u)  = Q^u);  density  quantile  fQ^(u)  = 
fx(Qx(u)).  A quantile  data  analysis  of  the  random  sample 

1.  Forms  sample  distribution  function  F^(x),  sample  quantile 
function  Q^(u),  sample  quantile  density  q(u)  at  a grid  of 


values  of  u in  0<u<l . 
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2.  Plots  sample  version  of  informative  quantile  function 


IQ(u)  ~ 2{oi'$5)5-  0TOT25 )'}' 


whose  values  as  u tends  to  0 and  1 indicates  the  tail 
exponents  of  the  probability  law  of  X. 

3.  Determines  standard  distribution  functions  FQ(x)  to  test 

Ho:  F(x)  = Fq(^)  or  Q(u)  = y + a Qq(u) 

for  location  and  scale  parameters  y and  a to  be  estimated.  A 
test  of  Hq  which  does  not  require  estimation  of  y and  a can  be 
based  on  [Parzen  (1979)] 

d(u)  = fQQ0(u)  q(u)  - aQ 

“o  ■ fl  WV  dt 
which  estimate  respectively 


d(u)  = f0Q0(u)  q(u)  v aQ 

°0  = fi  fo«o(t>  ■!(*)  dt- 


4.  Form  successive  autoregressive  estimators 


-2 


whose  negentropy 

H = /}  - log  d (u)  du  = - log  Km 
m Jo  3 m 3 m 

is  used  to  determine  optimal  orders  m.  Note  that  Hm  estimates 
the  entropy  difference 
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A = {log  o0  - /J  log  f0Q0(u)}  - {-  /J  log  fQ(u)  du} 

5.  Estimate  fQ(u)  by 

fQ  (u)  = f 0 (u)  v a d (u) 

^m  oo  o m 

where  m is  chosen  equal  to  an  optimal  order  m. 

B.  Two  Sample:  Univariate 

Let  X and  Y be  continuous  random  variables  with  random  samples 
X-|,...,Xm  and  Y-j , . . . ,Yn  respectively,  and  with  respective  distribution 
functions  F(x)  = Pr[X<x],  G(x)  = Pr[Y<x].  The  pooled  sample 
X^,...,Xm,  Y^,...,Yn  can  be  regarded  as  a random  sample  from  the 
distribution  function 

H(x)  = A F(x)  + (1-x)  G(x) , A = . 

To  test  the  hypotheses  of  equality  of  distributions,  Hq:  F(x)  = 
G(x)  = H(x),  it  is  customary  in  non-parametric  statistics  to  introduce 


Dx(u)  = F H"1 (u) , Dy(u)  = G H_1(u) 


with  densities  [equivalent  to  likelihood  ratios] 

d (u) , , d (U) , s . 

X f H~  (u)  Y h H-1 ( u) 

Note  that  h H_1(u)  = A f H_1(u)  + (1-  a)  g H_1(u);  therefore 


dx(u)  = \ a + (i-A) 

X 1 fH-](u) 


-1 
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Parzen  (1983)  shows  that  all  conventional  two-sample  nonparametric 
test  procedures  are  functionals  of  the  following  raw  estimator  of 
Dx(u): 

Dx(u)  = {H  F'VCu) 

from  which  one  can  form  "pseudo-correlations"  p(v)  and  linear  rank 
statistics  a ( J ) with  score  function  J(u), 

p(v)  = /J  e2ir1UV  d Dx(u)  , A(J)  = /J  J(u)  dDx(u)  , 


and  autoregressive  estimators  dx  m(u)  of  dx(u). 

When  one  observes  several  variables  X^,  X^,.  . ,,X^i 
estimates  functionals  of  Mu)  = F ^ ^ (H- 1 ( u) ) or  Djk(u)  = 


one 


Fx(j)(F^)(u)) 


C.  One  Sample:  Bivariate 

Let  (X^ , X^)  be  jointly  continuous  random  variables  with 
distribution  function  Fx  x (x^Xg)  = Pr[Xi_<x  , X2<x2]  and  density 

fv  y (x,,x„).  The  joint  density  quantile  function  is  defined  by 
*1  »*2  1 *- 

fQx  x ( Ui , U2 ) - fx  x (Qx  (ui ) , Qx  (Ug) ) • 

To  estimate  fQ  we  define 


Dxrx2  (uru2^  = Fxrx2  ^X^V*  QX^U2^ 

which  is  the  distribution  function  of  U,  = Fv  (X.),  IL  = Fv  (X0);  it 

1 ''I  I ^ ^ 

has  density 
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dv  v (u,  ,uj  = — D(u, ,u0) 

X-|  ,X2  1 2 3U-j  au2  1 2' 


satisfying 


fQ 


To 


xrx2  ^ul’u2^  = fQx1  fQX2^U2)  ^.X^V1^  • 

estimate  ^ from  a random  sample  (X-|^,  X2^),  j=l , . . 


• » n > 


form 


Dy  Y ^*Y  Y (Qy  (^l)»  Qy  (Oo)) 
a-j  * ^2  Ai 5 ^2  Ai  * a2  ^ 

A ~ 

and  a raw  estimator  d^  ^ (u^u2).  We  smooth  log  d^  ^ (u^,u2)  by  a 

A 

smooth  estimator  log  d^  ^ (u^,u2)  minimizing  a criterion  similar  to 

l I log  d[U1^^,U2^h  - log  d^U-,^,  U2^J^]|2 
J ^ 


where  log  d^Cujfi^)  has  the  parametric  representation 

log  d (Vu  ) - I 0 exp  i (u  » + u v ) 

V1 ,v2  1 2 


- *(e  ..  ) ; 


vrv2 


where  the  summation  is  over  v,  ,v9  = 0,  + l,...,+m,  and  ip(e  ) is  an 

I C.  \>1  ,v2 

integrating  factor  to  make  dm(ui>u2)  a probability  density.  The 

foregoing  estimators  have  been  implemented  in  T.  J.  Woodfield  [1982]. 

The  problem  of  choosing  a best  value  of  the  order  m is  approached  by 

evaluating  the  entropy  of  dm< 
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D.  Two  Samples;  Bivariate 

Let  (X-j  ,X2)  and  (Y-j  ,Y2)  be  random  vectors  with  respective 
distribution  functions  F(x-i,x2)  and  G(y-|,y2),  and  respective  random 
samples 

(Xl,<Vj))>  and  (Y1(k),Y2(k)),  k=l ,2 n. 

Let  H(x-|,x2)  denote  the  distribution  function  of  the  pooled  random 
sample,  with  marginal  distribution  functions  H-j  (x^ ) and  H2(x2).  Define 

D1(u],u2)  = F(H"] (U] ) , H^1 (u2))  , 

D2(ulfu2)  = FOi^u,),  H“1(u2)) 


From  D^(u^,u2)  and  D2  ( u-j  ,u2)  one  can  form  raw  estimators  d^  ( u^ , u'2 ) and 

d2(u-|,u2)  of  the  densities 

ftH^tu,),  H‘1(u2}) 
dl (“l *u2)  = — Tf 


h^"'^)  h2H“1(u2) 

g (h7  1 (u-, ) , hZ1  (u2) ) 
d2(u, ,u2)  = i — i 

1 h-jH-'Cu-,)  h2H2'(u) 

log  d1(ui>u2)  - log  d^.u^ 

log  f(H^1(u1),  H2](u2))  - log  g(H^(u1),  H21(u2)) 


Therefore 


The  likelihood  ratio  f(x-j  ,x2)/g(x-|  ,x2)  can  be  effectively  estimated  by 
estimating  log  d^  ( u^ , u2 ) - log  d2(u-|,u2).  It  seems  most  natural  to 
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estimate  [using  exponential  model  representations] 

log  d1(u1,u2)  - log  d^tuj)  - log  d-j2(u2) 

where  d^(u^)  and  d^2(u2)  are  the  marginal  densities  of  d^(u^,u2)  which 
are  estimated  separately  by  methods  of  two  samples:  univariate. 

The  final  output  are  contour  plots  of  the  classification  statistic 

L(x-j,x2)  = log  f(x1,x2)  - log  g(x-j  ,x2) 

A point  (x-j,x2)  is  classified  in  population  1 or  2 by  whether  L(x-j,x2) 
exceeds  a threshold  which  depends  on  the  prior  probabilities  and  loss 
function. 
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Appendix:  Exploratory  Quantile  Data  Analysis 

of  Training  Files 

The  basic  tool  for  determining  statistical  characteristics  that 
are  good  discriminators  is  to  determine  (for  each  file,  ground  truth, 
and  channel)  a data  batch  of  measurements  in  the  specified  channel  on 
all  pixels  with  the  specified  ground  truth.  The  statistical 
characteristics  of  these  data  batches  are  summarized  (as  on  the 
attached  pages)  and  studied  to  determine  patterns  which  can 
discriminate  between  different  ground  truths.  The  file  numbers  are 
those  used  in  the  Fundamental  Research  Data  Base  [see  Guseman  (1983)]. 


I 


) ' 1 


1 


( ( 

STATISTICAL  AN 

G S 


F 

C 

R 

M 

M 

I 

H 

P 

E 

S 

L 

A 

T 

L 

I 

A 

T 

E 

N 

R 

0 

0 

0 

N 

D 

□ 

N 

U 

S 

0 

B 

N 

E 

T 

I 

2 

5 

7 

T 

T 

S 

0 

L 

H 

2 

5 

0 

5 

R 

0 

Q 

1 

1 

2 

90 

377 

15.000 

19 

24.0000 

9.0000 

0.08564 

0.322475 

2 

1 

2 

92 

8755 

14.000 

15 

19.0000 

5 . 0000 

0. 17327 

0.442017 

3 

1 

2 

97 

10573 

14.000 

15 

19.0000 

5 . 0000 

0. 17437 

0.463493 

4 

1 

2 

98 

793 

13.000 

14 

17.0000 

4 . 0000 

0. 14415 

0.416479 

5 

1 

2 

100 

2296 

15.000 

18 

2 1 . 0000 

6.0000 

0.07333 

0.448731 

6 

1 

2 

104 

558 

15.000 

18 

23.0000 

8 . 0000 

0.09492 

0.382529 

7 

1 

2 

1 1 1 

174 

19.000 

23 

27.0000 

8 . 0000 

-0.01331 

0.326849 

8 

1 

2 

113 

248 

19.000 

21 

24.0000 

5 . 0000 

-0.00419 

0.407626 

9 

1 

2 

1 14 

95 

18.000 

20 

24.0000 

6 . 0000 

0.08244 

0.381941 

10 

1 

2 

164 

2980 

15.000 

19 

22.0000 

7.0000 

0.00592 

0.373707 

1 1 

1 

2 

242 

1326 

15.000 

19 

21 .0000 

6.0000 

-0.00299 

0.367563 

12 

1 

3 

90 

377 

14.000 

20 

30.7500 

16.7500 

0.07147 

0.274792 

13 

1 

3 

92 

8755 

1 1 . 500 

14 

18.0000 

6 . 5000 

0.09520 

0.506777 

14 

1 

3 

97 

10573 

10.000 

13 

18.0000 

8.0000 

0.13187 

0.407418 

15 

1 

3 

98 

793 

9.500 

12 

14.0000 

4 . 5000 

0.06506 

0.492558 

16 

1 

3 

100 

2296 

12.500 

16 

2 1 . 5000 

9 . 0000 

0. 1 1257 

0.407630 

17 

1 

3 

104 

558 

13.000 

16 

24.0000 

1 1 . 0000 

0. 1 1668 

0.345959 

18 

1 

3 

1 1 1 

174 

20.500 

26 

33.0000 

1 2 . 5000 

0.02218 

0.339094 

19 

1 

3 

1 13 

248 

20.000 

28 

33.0000 

13.0000 

-0.05665 

0.319116 

20 

1 

3 

1 14 

95 

18.000 

24 

29.0000 

1 1 . 0000 

-0.01359 

0.347879 

21 

1 

3 

164 

2980 

13.000 

18 

24.0000 

1 1 . 0000 

0.05420 

0.367705 

22 

1 

3 

242 

1326 

14.500 

20 

26.0000 

1 1 . 5000 

0.03849 

0.350681 

23 

6 

2 

19 

84 

26.000 

27 

29.0000 

3.0000 

0. 10974 

0.422269 

24 

6 

2 

20 

84 

24.025 

27 

28.9375 

4.9125 

-0.02021 

0.313054 

25 

6 

2 

21 

68 

26.780 

27 

29.0000 

2 . 2200 

0 . 30070 

0.676187 

26 

6 

2 

22 

138 

26.000 

28 

30.0000 

4.0000 

0.05138 

0.400085 

27 

6 

2 

24 

75 

29.000 

30 

33.0000 

4.0000 

0.08499 

0.314726 

28 

6 

2 

25 

98 

31  .880 

33 

36.0000 

4. 1200 

0.02669 

0.411211 

29 

6 

2 

26 

59 

28 . 700 

29 

32.0000 

3 . 3000 

0.08760 

0.436342 

30 

6 

2 

27 

66 

29.000 

29 

31 .0000 

2.0000 

0.  17336 

0.514212 

31 

6 

2 

29 

90 

26.920 

29 

30.0000 

3.0800 

-0. 14964 

0.638682 

32 

6 

2 

30 

147 

30.000 

33 

36.0000 

6.0000 

-0.00780 

0.289214 

33 

6 

2 

80 

262 

29.000 

33 

35.0000 

6.0000 

-0. 10390 

0.357624 

34 

6 

2 

90 

110 

27.000 

29 

31 .0000 

4.0000 

0.04559 

0.376118 

35 

6 

2 

92 

70 

30.000 

32 

33.0000 

3.0000 

-0.07507 

0.374930 

36 

6 

2 

94 

719 

29.000 

31 

33.0000 

4.0000 

0.03415 

0.538090 

37 

6 

2 

95 

802 

29.000 

30 

33.0000 

4.0000 

0.09802 

0.471782 

38 

6 

2 

100 

7449 

29.000 

31 

33.0000 

4.0000 

0.00000 

0.435172 

39 

6 

2 

101 

667 

29.000 

32 

34.0000 

5.0000 

-0.04075 

0.413778 

40 

6 

2 

103 

166 

29.000 

33 

35.0000 

6.0000 

-0.09408 

0.388473 

41 

6 

2 

104 

286 

29.000 

30 

33.0000 

4.0000 

0.07518 

0.449739 

42 

6 

2 

1 1 1 

3033 

27 . 500 

30 

33.0000 

5.5000 

-0.05079 

0.472771 

43 

6 

2 

1 12 

52 

29.000 

30 

32.0000 

3.0000 

-0.01842 

0.545337 

44 

6 

2 

164 

3344 

28.000 

30 

33.0000 

5.0000 

0.04901 

0.460380 

45 

6 

2 

240 

581 

15.000 

20 

29.0000 

14.0000 

0.06107 

0.257827 

46 

6 

2 

242 

1724 

25.000 

29 

33.0000 

8.0000 

-0.02228 

0.402009 

47 

6 

2 

250 

430 

29.000 

32 

33.0000 

4.0000 

-0. 10988 

0.482274 

48 

6 

3 

19 

84 

25.000 

26 

29.0000 

4.0000 

0. 15562 

0.428386 

49 

6 

3 

20 

84 

23.000 

26 

29.0000 

6.0000 

0.09133 

0.417561 

j i i ) l ) . I l — l ; ‘ ) 


A L Y S I S SYSTEM  21:09  TUESDAY,  MAY  17,  1983  441 

L L L L 

LOO  S 0 G 

0 G G I G _ 

1 1 G G _ S 
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-0.345S  1.05556  -1.3536  -1.5B00  -0.90686  0.50636  -0.68050  . 

-0.4000  2.80000  0.1066  -0.5821  -0.90812  0.80297  -0.21943  . 

-0.4000  2.70000  -1.3959  -1.8841  -0.90812  0.65714  -0.41986  . 

-0.3750  2.25000  -1.1693  -1.5917  -0.90686  0.61600  -0.48451  . 

-0.5833  2.08333  0.0933  -0.6163  -0.90686  0.82098  -0.19726  . 

-0.4375  1.18750  -2.0936  -2.4581  -0.90812  0.58059  -0.54370  . 

-0.7500  0.81250  -5.1738  -5.5843  -0.90812  0.60793  -0.49770  . 

-1.0000  1.10000  -1.5537  -2.3137  -0.90812  0.86235  -0.14809  . 

-0.7500  1.00000  -1.8184  -2.3698  -0.90812  0.69993  -0.35677  . 

-0.5714  1.50000  -0.0626  -0.6454  -0.90812  0.72225  -0.32539  . 

-0.6667  1.58333  -0.0152  -0.6651  -0.90812  0.77239  -0.25827  . 

-0.3284  0.68657  -1.0524  -1.1649  -0.90686  0.45186  -0.79437  . 

-0.4615  3.00000  0.1017  -0.5798  -0.90812  0.79720  -0.22665  . 

-0.3125  1.93750  -0.1479  -0.6297  -0.90812  0.65293  -0.42629  . 

-0.4444  2.55556  -0.6937  -1.3238  -0.90812  0.75730  -0.27800  . 

-0.4444  1.66667  -0.5910  -1.0575  -0.90812  0.64299  -0.44162  . 

-0.3636  1.00000  -0.8114  -1.1307  -0.90812  0.55493  -0.58891  . 

-0.7200  0.68000  -3.7113  -4.1667  -0.90812  0.63589  -0.45273  . 

-0.7692  0.65385  -1.3456  -1.8080  -0.90812  0.64040  -0.44567  . 

-0.7273  0.68182  -4.1139  -4.6001  -0.90812  0.65577  -0.42194  . 

-0.4545  1.40909  -0.5273  -0.9115  -0.90812  0.59221  -0.52390  . 

-0.5217  1.43478  -2.1322  -2.4893  -0.90812  0.57637  -0.55101  . 

-0.8333  1.33333  -7.7867  -8.4229  -0.90686  0.76287  -0.27066  . 

-0.5089  1.01781  -7.0616  -7.3181  -0.90686  0.52187  -0.65034  . 

-1.1261  2.25224  -6.9264  -7.9460  -0.90812  1.11795  0.11149  . 

-0.7500  1.12500  -7.2153  -7.9296  -0.90812  0.82389  -0.19372  . 

-1.0000  0.75000  -7.9774  -8,4502  -0.90812  0.64713  -0.43521  . 

-1.3349  0.84951  -7.2732  -8.1135  -0.90812  0.93442  -0.06783  . 

-1.0606  1.06059  -6.8373  -7.6235  -0.90812  0.88522  -0.12192  . 

-1.7500  1.50000  -7.3150  -8.3163  -0.90812  1.09776  0.09327  . 

-1.1364  1.13636  -5.8349  -7.1081  -0.90812  1.44067  0.36511  . 

-0.9167  0.66667  -7.0916  -7.5682  -0.90686  0.65036  -0.43023  . 

-0.9167  0.58333  -1.7481  -2.5181  -0.90812  0.87100  -0.13811  . 

-0.8750  1.12500  -7.0515  -7.5858  -0.90812  0.68816  -0.37374  . 

-1.6667  0.50000  -8.1278  -8.8591  -0.90812  0.83789  -0.17687  . 

-1.1250  1.62500  -4.4185  -5.5421  -0.90812  1.24040  0.21543  . 

-1.0000  1.50000  -0.4534  -1.3925  -0.90812  1.03154  0.03105  . 

-1.1250  1.37500  0.2642  -0.7829  -0.90812  1.14906  0.13894  . 

-1.0000  1.40000  -0.3265  -1.1803  -0.90812  0.94716  -0.05429  . 

-0.9167  0.75000  -6.3937  -7.0892  -0.90812  0.80842  -0.21268  . 

-1.0000  1.25000  -0.6947  -1.7583  -0.90812  1.16820  0.15547  . 

-0.7273  1.09091  -0.2712  -1.3387  -0.90812  1.17272  0.15933  . 

-1.3333  1.33333  -6.3356  -7.4244  -0.90812  1.19797  0.18063  . 

-0.8000  2.10000  -0.3333  -1.2640  -0.90812  1.02287  0.02262  . 

0.0714  0.78571  -1.3245  -1.3068  -0.90812  0.39618  -0.92590  . "J 

-0.4375  0.81250  0.0246  -0.8394  -0.90812  0.95689  -0.04407  . 

-1.2500  1.87500  -0.7696  -1.6981  -0.90812  1.02054  0.02033  . 

-0.5000  1.12500  -7.5269  -8.0800  -0.90686  0.70208  -0.35371  . 

-0.3333  1.25000  -6.1938  -6.6268  -0.90812  0.62182  -0.47511  . 
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30.0000  2.0000  0.27126  0.704507 

31.0000  5.0000  -0.06978  0.379523 

33.0000  4.0000  -0.10535  0.368432 

36.0000  3.1200  0.02724  0.504708 
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35.0000  5.5000  -0.02102  0.403830 
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44.0000  8.0000  -0.04710  0.376295 
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41.5000  7.5000  0.039870.398162 

42.0000  4.0000  0.09039  0.419874 

44.0000  4.0000  -0.00626  0.363281 

40.9375  2.9375  -0.04870  0.314599 

42.0000  4.0000  0.01268  0.309138 
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36.5000  6.5000  0.193450.565437 

40.0000  4.0000  0.01380  0.362123 

42.0000  2.0000  0.179190.557002 

42.0000  2.0000  0.00876  0.672536 

50.0000  15.0000  0.15651  0.383497 

48.3599  4.3599  -0.05873  0.461914 

46.0000  4.3000  0.00828  0.362519 

44.8750  3.8750  -0.04080  0.373627 
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1.5000  2.00000  -6.2044  -7.4344  -0.90686  1.38154  0.32320 

0.7000  0.70000-2.1660  -2.9263  -0.90812  0.86258  -0.14783  ! 
1.2500  0.62501  -7.1877  -7.8403  -0.90812  0.77452  -0.25551 

1.9231  0.96153  -1.4831  -2.5190  -0.90812  1.13631  0.12779 

0.8054  0.87249  -4.6571  -5.2885  -0.90686  0.75918  -0.27551 

0.9763  0.83684  -7.1108  -7.7215  -0.90812  0.74273  -0.29742 

0.4590  0.78688  -1.6667  -2.3720  -0.90812  0.81641  -0.20284 

1.2000  0.70001  -5.5537  -6.3597  -0.90812  0.90293  -0.10211 

0.9643  0.75001  -1.5098  -2.4394  -0.90812  1.02177  0.02154 

■1.7943  1.07657  -6.1105  -7.1068  -0.90812  1.09219  0.08818 

2.0833  1.41667  -7.3690  -8.3061  -0.90812  1.02944  0.02901 

1.3000  1.20000  -4.9297  -6.1265  -0.90812  1.33465  0.28867  . 

0.6667  1.13333  -0.4756  -1.0958  -0.90812  0.74982  -0.28793 

0.7857  1.42857  -0.0980  -0.7663  -0.90812  0.78671  -0.23990  . 

1.0000  1.16667  -0.9997  -1.8437  -0.90812  0.93786  -0.06415 

0.7857  0.71429  -2.2483  -2.9266  -0.90812  0.79469  -0.22981 

0.9091  0.90909  -6.8991  -7.7758  -0.90812  0.96907  -0.03142 

0.8571  1.28571  0.2503  -0.9520  -0.90812  1.34200  0.29416 

1.1250  0.43750  -1.3283  -2.1256  -0.90686  0.89624  -0.10955  . 

0.7857  1.85714  0.0516  -0.8871  -0.90812  1.03101  0.03054  . 

0.1013  0.70887  -1.4929  -1.4949  -0.90812  0.40409  -0.90613  . 

■0.4762  0.90476  -3.2287  -4.0342  -0.90812  0.90245  -0.10264  . 

1.0000  1.06667  -1.6581  -2.4186  -0.90812  0.86273  -0.14766  : 

■0.6250  1.75000  -7.3856  -7.9428  -0.90686  0.70493  -0.34966  . 

0.8750  1.00000  -7.0617  -7.6737  -0.90686  0.74464  -0.29486 

0.8511  0.68086  -8.3158  -8.8274  -0.90686  0.67351  -0.39526 

•0.6250  0.62500  -8.1677  -8.6187  -0.90812  0.63311  -0.45711 

■0.7500  1.00000  -8.3292  -9.3228  -0.90812  1.08931  0.08555  . 

0.2000  0.80000  -1.4149  -2.0477  -0.90812  0.75936  -0.27529 

0.0000  1.25000  -6.8548  -7.4064  -0.90812  0.70017  -0.35644 

0.0000  2.49996  -6.7873  -8.3991  -0.90812  2.02124  0.70371 

0.1667  3.00000  -1.4564  -2.1781  -0.90812  0.82991  -0.18644  . 

■1.5000  1.50000  -7.8764  -8.9025  -0.90812  1.12527  0.11802  . 

0.3333  1.00000  -1.1591  -1.8038  -0.90812  0.76842  -0.26342  . 

0.3333  1.33333  -1.8450  -2.7436  -0.90812  0.99053  -0.00952  . 

0.0000  0.58333  -6.5088  -7.3084  -0.90812  0.89714  -0.10855  . 

0.6250  2.62500  0.1869  -0.8899  -0.90812  1.18372  0.16866  . 

0.3333  2.00000  -1.1158  -2.4396  -0.90812  1.51549  0.41574  . 

0.3000  1.30000  -2.6301  -3.7291  -0.90812  1.21036  0.19092  . 

0.1923  0.65385  -6.4074  -6.4934  -0.90812  0.43950  -0.82211  . 

0.1818  5.36363  -0.5111  -1.1246  -0.90812  0.74478  -0.29467  . 

0.2308  4.69231  -0.0432  -0.6174  -0.90812  0.71613  -0.33389  . 

■0.3750  1.12500  -0.4216  -1.1079  -0.90812  0.80097  -0.22193  . 

1.2500  1.75000  -6.9107  -7.9306  -0.90686  1.11974  0.11310  . 

1.5000  1.50000  -1.5916  -2.8273  -0.90812  1.38751  0.32751  . 

0.1000  1.46667  -0.3297  -0.7312  0.90812  0.60252  -0.50664  . 

■1.1468  0.80276  -6.7082  -7.6067  -0.90686  0.99164  -0.00839  . 

0.9302  0.81395  -7.4606  -8.0711  -0.90812  0.74262  -0.29757  . 

0.9032  0.90324  -7.7439  -8.3910  -0.90812  0.77026  -0.26103  . 
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99 
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100 

0.388799 

-1 .0000 
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-8.6831 

-9.3572 
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0.485654 

-0. 1250 

1 . 25000 

-1 .5900 

-2.5285 

102 
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-0. 1250 
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-0. 1667 
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4 . 50000 
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-7 . 3487 
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-2. 1215 
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1 15 
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1 .75000 

0.0988 
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1 16 

0.286119 
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0.89768 
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-8.0574 

117 

0.441134 

-1 .OOOO 

' 0.87500 

-7.2207 
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1 18 
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-0. 1950 
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-1  . 1408 
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-1 .3491 
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0.572547 
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0.3307 
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-0.1113 

-0.7367 
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- 1 .7848 
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41 .0000 
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-0.90686 
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-0.90812 
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-0.90812 
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-0.90812 
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0.04391 
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0.60830 

-0.90812 
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' 0.59900 
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0.66354  co 
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-0.00303 
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, 
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136 

2 

3 
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50. 1250 

56.0000 
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0.024783 

0.48401 
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2 

3 

1 13 
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38.2500 
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52.0000 

13.7500 

-0.054036 
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56.0000 

12.7600 

0.002850 
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0.019298 

0.39258 
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143 

3 

2 

97 

10573 

1 1 .0000 

1 3 . OOOO 

15.0000 

4.0000 

0.065000 

0.47157 
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0.37170 
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0.265000 

1 .07239 
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0.67607 
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2 
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1 5 . OOOO 
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18.0000 

3.0000 

0. 130000 

0.46486 
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1 9 . OOOO 

21 .7500 

6.7500 

-0.022091 

0.35933 
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15.0000 

20.0000 

24.0000 

9.0000 

0.038804 

0.37734 
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1 1 1 
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1 4 . OOOO 

1 5 . OOOO 

1 8 . OOOO 

4.0000 

0. 113748 

0.40674 
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2 
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248 

16.0000 

19.0000 

22.0000 

6.0000 

0.078022 

0.48477 
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14.0000 

17.0000 

5.0000 

0.  128838 

0.45227 
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1 .8750 
2 . 2500 
1.5734 
3.0435 
1.4815 
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1 .0000 
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7.7500 
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1 . 0000 
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1.8000  -1.0773  -1.9210  . 0.939  -0.06316  1.85783 

1.7500  0.0179  -0.8324  . 0.945  -0.05657  0.77584 

54.0000  5.6519  -0.2058  . 141.298  4.95087  5.15666 

0.7463  -0.9628  -1.4863  . 0.682  -0.38332  1 -10298 

1.5385  0.0063  -0.6597  . 0.786  -0.24087  0.41878 

1.8077  -0.3133  -0.8201  . 0.670  -0.40002  0.42011 

0.7302  -3.7986  -4.1619  . 0.581  -0.54366  3.61820 

1.0000  -1.3375  -1.9590  . 0.752  -0.28540  1.67357 

1.0625  -0.9529  -2.2132  . 1.424  0.35340  2.56656 

1.5833  -1.2025  -1.9647  . 0.865  -0.14471  1.81998 

1.5897  -0.5454  -1.3403  . 0.894  -0.11194  1.22837 

1.3500  -1.4246  -1.6591  . 0.510  -0.67239  0.98671 

6.0000  -0.4299  -1.3389  . 1.002  0.00215  1.34102 

5.5833  0.5960  -0.4769  . 1.181  0.16602  0.64293 

3.6667  0.5786  -0.5688  . 1.272  0.240500.80929 

1.1667  -0.0841  -0.6151  . 0.687  -0.37587  0.23926 

0.7568  -1.3346  -1.6888  . 0.575  -0.55274  1.13603 

1.6953  -5.8105  -6.0374  . 0.507  -0.67999  5.35745 

0.9565  -1.2237  -1.7680  . 0.696  -0.36247  1.40558 

1.8333  -1.3445  -1.8854  . 0.694  -0.36594  1.51947 

3.0000  -0.1000  -0.7010  . 0.736  -0.30590  0.39508 

1.6250  -1.9339  -2.3006  . 0.583  -0.54022  1.76037 

1.1388  -4.9206  -5.2071  . 0.538  -0.62028  4.58686 

2.1875  0.1278  -0.7405  . 0.962  -0.03859  0.70192 

1.0000  -0.1971  -0.7710  . 0.717  -0.33296  0.43803 

0.5825  -0.6155  -1.3129  . 0.811  -0.20954  1.10331 

1.9444  0.1266  -0.7220  . 0.943  -0.05830  0.66370 

1.4591  -4.8370  -5.4721  . 0.762  -0.27180  5.20027 

0.6015  -4.4886  -4.7421  . 0.520  -0.65337  4.08873 

1.5000  -1.3288  -2.3599  . 1.132  0.12421  2.48407 

1.3000  -3.4513  -4.2417'.  0.890  -0.11649  4.12522 

1.4167  -0.5358  -1.2137  . 0.795  -0.22891  0.98481 

1.4681  -0.7698  -1.3693  . 0.735  -0.30730  1.06203 
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ABSTRACT 


The  objective  of  this  research  is  to  investigate  the  robustness  of 
discriminant  functions  to  non-normality.  This  study  will  assess  the 
performance  of  procedures  relative  to  measures  of  the  difference  between 
the  actual  distribution  of  the  observations  and  the  usual  assumption  of 
normal  densities.  For  example,  the  two  population,  mixed  distributions 
problem  with  equal  costs  of  misclassification  will  be  considered.  The 
parameters  will  be  estimated  by  maximum  likelihood  and  recently  proposed 
robust  methods. 
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the  LDF  and  the  quadratic  discriminant  function  (QDF)  when  covariance 

matrices  are  unequal.  She  restricted  the  covariance  matrices  to  differ 

2 2 

by  only  a scalar  multiple,  = a o >0. 

Adopting  a canonical  form  from  Dunn  and  Holloway  [5]  the  densities 

2 

were  transformed  to  N(0,  I)  in  II-j  and  N(u,  a I)  in  n^.  The  LDF  did 

2 

satisfactorily  in  a moderate  range  of  a (near  one)  and  improved  as  the 
distance  between  populations  increased.  Marks  and  Dunn  [13]  also  in- 
vestigated the  performance  of  discriminant  functions  when  covariance 
matrices  differed.  They  considered  a more  general  model  which  has 
canonical  form  N(0,  I)  in  Ji-|  and  N(y,  a),  in  i^,  where  A = diag 

(x,  ...,  x,  1,  ...»  1).  Using  Monte  Carlo  methods  the  sample  LDF  out- 

2 

performed  the  sample  QDF  only  in  a small  range  of  X near  one. 

Several  studies  have  been  performed  to  investigate  the  LDF  under 
non-normality.  Lachenbruch,  Sneeringer,  and  Revo  [11]  used  three  types 
of  non-linear  transformations  discussed  in  Johnson  [10]  to  study  the 
LDF  under  non-normal  conditions.  They  performed  a Monte  Carlo  experi- 
ment to  simulate  sampling  from  non-normal  populations  and  compared  the 
misclassification  probabilities  to  those  expected  when  both  populations 
are  normal.  The  sample  LDF  exhibited  substantial  differences  from  ex- 
pectations when  sampling  from  normal  populations,  and  the  overall  mis- 
classification probabilities  increased  for  some  of  the  transformations. 
Moreover,  the  authors  found  the  misclassification  probabilities  for  one 
population  to  be  larger  than  expected  while  the  other  population  was 
smaller  than  expected.  They  did  note  that  the  range  of  the  variables 
affected  the  performance  of  the  sample  LDF;  a bounded  variable  produced 
less  effect  than  an  unbounded  variable.  Their  study  was  restricted  to 
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independently  distributed  variables. 

Ashikaga  ([2],  [3])  has  studied  the  LDF  using  the  mixed-normal  dis- 
tribution. The  model  is 

f-j  (X)  = (1  - o(1))N(ji/1),  + a(1)  N(h-|  (1 ) + e.  o\) 

and 

f2(x)  = (1  - a(2))N(u/2\  Sl)  + a(2)  N(yi(2)  + 0,  a2Z] ) , a2  > 0. 

In  canonical  form  these  models  reduce  to 

g1  (X)  = (1  - a^)N(Ot  I)  + a^W'Q,  a2I) 


and 

g2(X)  =(l-a(2))N(C,(u1(2)-vi1(1))  ,I)+a(2)N(C,(1J1(2)-y1(1))  + C'0,  a2I), 
where  e-j  = CC'.  In  choosing  mixture  proportions  (ct^  and  a^), 
Ashikaga  considered  both  populations  to  have  mixed-normal  distributions 
(a^  = a^)  and  the  case  where  one  population  had  an  assumed  normal 
distribution  while  the  other  had  a mixed-normal  distribution 


(a 


t 0). 


The  distinctive  feature  of  Ashikaga' s study  was  the  introduction  of 
two  measures  of  non-normality  which  illuminate  relationships  between  the 
robustness  of  the  LDF  and  the  extent  of  non-normality.  The  first  measure 
was  based  on  a multivariate  skewness  statistic  of  Malkovitch  and  Afifi 
[12]..  A second  measure  derived  by  the  author  was  an  overall  measure  of 
non-normality,  it  being  the  sample  size  necessary  to  test  a simple  hy- 
pothesis that  an  observation  is  from  a normal  population  versus  the  al- 
ternative that  it  was  from  a mixed  normal  population.  When 
with  identical  distributions,  the  LDF  did  well  if  n-j  and  n2  had 
sufficient  separation,  (say,  a > 4,  where  A is  Mahalanobis  distance). 


247 


In  the  case  of  one  population  having  a normal  distribution  and  the  other 
a mixed-normal  distribution,  the  LDF  performed  poorly  for  all  values  of 

A2. 

Randles  and  others  [14]  constructed  two  discriminant  functions  to 
be  robust  to  changes  in  the  population  model.  The  first  is  a general- 
ization of  Fisher's  [6]  approach  to  derive  the  linear  discriminant 
function  by  maximizing  the  separation  of  the  groups, 

§' (x(1 ) - x(2))//fSg. 

If  m = (x(1)  + x^)/2,  then  Randles  found  the  vector  §o  which  maxi- 
mizes 

l E1  t([6'(x.  - m)]//3ISir  + - Z2  x ([e'(m  - , 

nl  i=l  - - - - n2  i=l  - - 

where  x is  a nondecreasing  and  nonconstant,  odd  function.  The  function 

x is  selected  to  reduce  the  influence  of  observations  far  away  from  the 

center,  m. 

The  second  procedure  is  to  substitute  Huber- type  estimates  for  the 
parameters  in  the  linear  discriminant  function.  This  method  replaces 

each  mean  and  covariance  matrix  with  robust  estimators 

-i  ni  ni 

x£  = zd  w.x./  zJw. 

i=l  1-1  i=]  1 


and 

si  = ej  w?(x,  - x’Mx, 

i=l  1 -1  “ -1 

respectively.  The  weights  are 
wi  = 2/d..,  if  d.  > 2, 

= 1,  if  d.  < 2 


X1)'/  Ej  wf,  j=l,  2, 
i=l  1 


and  the  distance  d^  is  defined  by 
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d,  - [(x  1 - ijnsVfjc,  - x)]4. 

Randles  has  found  five  iterations  are  sufficient  to  reduce  the  effect 
of  outliers  by  computing  new  distances  and  weights  at  each  stage  using 
the  robust  estimates  of  the  previous  stage. 

2.  Bayes1  Classification  and  Mixture  Distributions 

An  observation  is  classified  into  one  of  q populations  (denoted  by 
n-j , n2>  ....  iiq)  on  the  basis  of  a discrimination  rule  and  a p-vector 
observation,  x = (x-p  x2,  . ..,  xp)'.  Assume  that  the  populations  have 
equal  costs  of  misclassification,  but  possibly  different  prior  probabi- 
lities (denoted  by  ir2,  ....  tt^}.  Also,  assume  that  the  distribution 
of  X is  a composition  or  mixture  of  m component  distributions.  Thus,  if 
f.(x)  represents  the  p.d.f.  for  n.,  then 

vj  ” J 

(1)  f.(x)  ■=  t cJj)  g<j)  (x), 

J ' i=l  1 1 

where  g^^(x)  is  the  i^  component  p.d.f.  and  a. ^ is  the  i^*1  compon- 
ent mixing  proportion  for  population  j;  i=l,  . ..,  m and  j=l , ...,  q. 

Equation  (1)  allows  for  a richer  and  more  flexible  class  of  p.d.f. 's 

than  used  in  previous  studies. 

In  general,  a classification  rule  should  depend  upon  whether  or  not 
the  source  component  of  an  observation  can  be  identified  and  this  infor- 
mation incorporated,  For  example,  let  the  q populations  represent  for- 
est/terrain types.  Then  multi  spectral  scanner  measurements  on  each 
population  could  be  represented  as  a mixture  of  components  (equation 
(1)).  Additionally,  if  in  sampling  a sub-pixel  could  be  pure  and 
identified  as  having  a observation  from  a particular  component  (say. 
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1 . Literature  Review 

Recall,  that  Fisher  [6]  proposed  the  linear  discriminant  function 
(LDF)  to  classify  an  individual  into  one  of  two  populations,  n-j  and  n^. 
Let  S be  the  pooled  covariance  matrix  and  d = - x^  be  the 

difference  between  means  from  samples  drawn  from  the  two  populations. 
Then  the  sample  LDF  is 

*(x)  = (x  - i(x(1)  + x(2)))'  S'V 

While  derivation  was  independent  of  any  distributional  assumption,  it 
required  that  the  populations  have  the  same  covariance  matrix. 

Welch  [16]  obtained  the  Bayes'  classification  rule  which  minimizes 
the  average  probability  of  mi sclassifi cation  when  prior  probabilities 
that  an  individual  was  selected  from  ii-j  or  n2  are  known.  He  established 
that  the  LDF  was  optimal  (in  the  Bayes  sense)  if  the  observations  in 
both  populations  are  normally  distributed  with  the  same  covariance 
matrix.  Later,  Wald  [15]  generalized  this  procedure  to  include  costs  of 
misclassification  and  also  replaced  any  unknown  parameters  by  their 
maximum  likelihood  estimates.  Hoel  and  Peterson  [8]  extended  these  re- 
sults to  include  more  than  two  populations. 

However,  in  practice  the  assumptions  under  which  the  linear  discri- 
minant function  is  Bayes  are  seldom  satisfied.  Nonetheless,  the  linear 
discriminant  function  with  parameters  estimated  from  training  samples 
(sample  LDF)  is  widely  used  and  serves  as  a benchmark  by  which  other 
procedures  are  judged. 

A number  of  studies  have  considered  the  behavior  of  the  LDF  when 
assumptions  under  which  it  is  optimal  are  violated.  Gilbert  [7]  compared 
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slash  pine),  then  this  information  should  be  used  both  in  estimation 
and  classification.  For  other  applications,  see  Chang  and  Afifi  [4]. 

It  is  more  likely  however,  that  such  additional  information  is  un- 
available. Hosmer  and  Dick  [9]  present  a fisheries  example  to  illustrate 
this  situation.  The  case  in  which  the  observation's  component  is  not 
known  will  be  the  basic  model  for  this  study,  but  the  known  component 
case  will  also  be  considered. 


2.  Component  Identity  Known 

Suppose  x is  known  to  come  from  component  a and  define  an  indicator 
vector  y = (y-|,  y2,  ....  ym)',  such  that 


yk 


k = a 
k f a 


In  this  case  y follows  a multinomial  distribution  with  parameters  n=l  and 

. W,  ...,  a ^ in  n..  Let  the  conditional  distribution  of  X 

I 2 m j - 

given  y be  gg^^  (x)  for  component  a in  n j . Then  the  joint  distributions 
are 

(2)  f(x.  t)  = g.(j)(x)  J aM. 

a - i=1  a 


in  n.  j=l,  ...,  q.  For  y = 1 the  Bayes'  classification  rule  is: 

o u 


(3) 


Assign  observation  x to  nk  if 

Va' Vk)<!>  > 'ftVW 


for  all  j^k.  If  (3)  is  satisfied  for  more  than  one 
population  n^,  then  assign  the  observation  to  population  min(k). 
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Then  d (x)  = ^.a  ^J^g,^(x)  is  the  discriminant  score  for  observations 

a J a a 

from  n.  given  the  v = 1 . In  this  case  probability  of  correct  classi- 
J a 


fi cation  is 
(4) 


Ya 


(k) 


PCorr(a)  ' ^ J VaU) 

q 

where  D = kU-|Dk|a,  Dk|a  fl  , | a = <f>,  k^k ' , is  a partition  of  the  sample 
space  of  x determined  by  the  Bayes  rule.  And  the  total  probability  of 
correct  classification  is 


m q ( i 

PC°rr  = if1^£f1Vi  ^PCorr(i) 


(5) 


q m 


S S V^k)/D  , 9ik)(x)dx 
=1  i=l  K 1 uk  i 1 " " 


k| 


A special  case  of  the  above  result  is  given  by  Chang  and  Afifi  (1974) 
by  considering  the  two  population  case  when  the  conditional  distribution 
of  X given  Y = y is  multivariate  normal  (see  Table  1). 

TABLE  1 

Chang  and  Afifi's  [4]  Model  for  Barbituate  Overdosers 


Prior  Probability 

Component  1 
(Short-acting  Drug) 


Population  1 
(Survivors) 

Population  2 
(Died) 

’1 

*2 

a|2^  = l-e2 

gj1 ^ = N(x;  ^ , S ) 

gj2^  = N(x;  y2»s) 

-O ) - e 
a2  01 

(2)  - 0 

a2 

g^  = N(x;  y1  + a ,s+r ) 

g22^  = N(x;  y2  +A,s+r) 

Component  2 
(Long-lasting  Drug) 
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The  Bayes'  classification  procedure  results  in  a "double"  LDF  rule: 
If  x belongs  to  component  1,  then  assign  x to  n,  if 

-i(u-|  + y2)'£  ^(y'i  “ ^2)  + ^Hi  “ ^ 

+ log  (e.j/02)  > log  (ir2/ir-|). 

Likewise,  if  x belongs  to  component  2, 
then  assign  x to  n-|  if 

+ y2)’(z+r)  ^ (y-j  - u2)  + X 1 ( Z+r ) - y2) 

(6) 

+ log[(l-0-|)/(l-e2)]  > log  (tt2/tt1  ) 

If  x is  not  assigned  to  n-j , then  assign  x to  n2. 


2.  Component  Identity  Unknown 

When  the  component  indicator  vector  Y is  unknown  the  only  data 
available  is  X.  The  component  may  be  unknown  because  the  pixel  may  be 
mixed,  or  the  data  could  be  retrospective  or  too  costly  to  obtain.  The 
class  component  densities  are  given  in  equation  (1),  and  Bayes  rule  is: 
Assign  x to  if 

(7)  Ttk  I a!k)g!k)(x)  > it.  I a!j)g(j)(x) 

K i=l  1 1 - - J i=i  1 1 - 


for  j=l,  ...,  1,  and  k is  the  smallest 
index  for  which  the  inequalities  hold. 
The  probability  of  correctly  classifying  x is 


^ PCorr(D)  " \ •f1  “i^  Iok  9ik^-^d- 

q 

where  D = UD.  is  a Bayes'  rule  partition  of  the  sample  space. 
k=lK 
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Risk  Characterization 


Let  R^Cx)  be  the  risk  that  an  observation  (x)  is  from  the  ith 
component,  given  that  x e n^;  i=l,  2,  m;  k=l , 2,  ....  q.  Thus, 

>(k) 


R;  ;(x)  = P{Xe  Component  i|X  = xen^} 
= P{Y.  = ljX  = x e nk> 


(9) 


,(k) 


gjk^(x) 


m 

E a 
£=1 


(k) 


g|k)(x) 


Now  relate  the  Bayes'  classification  rule  when  the  component  of 
is  unknown  to  the  m possible  Bayes'  classification  rules  when  the  com- 
ponent of  x is  known.  Define 


(10) 


Aj,k 


,k  ; «sk)  . fjk)(x) 

K i=l  1 1 ~ 

tt.  E cJJ'^  . f!*^(x) 
J i=l  1 1 “ 


to  be  the  weighted  likelihood  ratio  for  n j and  under  model  (2). 
Similarly,  define 

(n>  xj!k  = 

/ ki 

Theorem  1:  If  R:  '(x)  >0  for  i=l,  ...,  m then  A.  .(x)  > 1 if  and  only 

• J » K “ 

if 

e rP’m  CxWtx)]-'  < 1. 

i=l  1 " J’K  “ 

From  (10)  A.  . (x)  > 1 implies  that 
J » K — 


Proof: 


IX 
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Algebraic  manipulations  yield  the  result. 

f kl 

For  a particular  value  of  x and  R:  '(x)  > 0 for  i=l,  2,  ....  m. 
note  that  the  Bayes1  classification  rule  when  the  component  is  uniden- 
tified is  a convex  combination  of  the  m alternative  rules  in  the  compo- 
nent of  H|^ . Now  compare  the  Bayes*  rules  for  component  known  model  with 
model  with  component  unknown.  First,  define  the  indicator  function  as 
follows: 


(12) 
Bayes 1 


I (x)  = < 


1,  if  x e Component  z 
0,  if  x i Component  z 
rule:  Component  of  x known: 


Assign  x to  , if 

" UxjxPhxr1  < 1 

•}=!  1 ~ J*K  ~ 

for  j,  k = 1,  2,  ...,  q.  If  this  inequality  holds 
for  more  than  one  value  of  k,  then  assign  x to  the 
population  with  smallest  k. 

(k) 

Bayes*  rule:  Component  of  x known  and  R:  ;(x)  > 0 for  i=l,  2,  ...,  m: 
Assign  x to  if 

” Rjk)(x)x<’7(x)  < 1 

•j=1  1 ” J)K  “ 

for  j,  k = 1,  2,  ...»  q.  If  this  inequality 
holds  for  more  than  one  value  of  k,  then  assign  x to 
the  population  with  smallest  k. 

Example  1:  Let  population  j have  an  m-component  distribution  where 

i^  component  has  p-variate  normal  distribution  with 


Q.  (x)  is  a QDF  and 

I y X/  " 

l_(J’k)(x)  is  the  LDF. 


Theorem  2: 

If  R^(x)  = 0 for  iel°c={1,  2,  . ..,  m},  then  A.  Jx)  > 1 if  and  only  if 
[A,  k(x)]'1  X r!J)(x)  + x R<k)(x)  < 1 

J,K  - ieIo  1 - i/JIo  1 - J,K 


Proof: 


Aj,k<5) 


«k  x »<k>  . 9<k>(x) 

= k i=l  1 1 > 1 

tt.  z o!^  . giJ’ix) 


J i=l  i 


m ( k)  ( k) 

fa  • Q . 

Z ( 1 *1 


(x)  _ . g?J)(x) 


1=1 

£=1  £ * " 


) > 0 


TT.a(^  . g(^(x)  /.N  , 

E ~J"  1m  • ck v — in — + E v (x)  d-vJ  > 0 

1el°  tt,  5 a k)  . g.  k (x)  Ul°  1 J’k  " 


k ,5 
K £=1  1 

,(j) 


Aj^k (x)  z R,-  (x)  + s Rr  (x)aT  u(x)  < 1 


00#„w-l 


iel‘ 


uv 


i j ,kv 


Corollary: 

(i) 

If  A.  . > 1 for  i = 1,  2,  ...,  m,  then  assign  the  observation  (whose 

J ~ 

component  is  unknown)  to  n^. 

Proof: 

This  is  a direct  result  of  (11)  and  (16).  Thus,  all  observations  which 
would  be  classified  into  regardless  of  their  component  of  origin,  will 
be  assigned  to  when  the  component  is  unknown.  Those  observations 
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mean 


= y . + e(J^ 


variance  = cr..z 


,(j)  i=i 


00 


and  mixture  parameters  a.  , j=l,  ...»  q.  Also,  define  0.  , the 

I i j Jo 

weighted  likelihood  function,  to  be 

(15)  0i\l  = aik)gik)^)/aik)gik)(^)  = Rik)(x)/Rik)(x). 

(kl 

Then,  01  ' represents  the  weighted  likelihood  ratio  between  the  densities 

I * Jo 

xL 

of  an  observation  x from  the  i component  of  to  the  density  from  the 

X L. 

£u  component  of  n^.  In  this  case,  the  Bayes1  rule  is  as  follows: 
Classify  x into  if 

it.  . z ot.N(y . + 

k i=l  1-1-1  1 > 1 ; j=l,  2,  ....  p. 

m / • \ o 

tt  . . £ a.N(y.  + 0^  ,ct.e) 

J i=l  1-1-11 


or 


m 


E R^k)(x)  [x*1']-1 

i=1  1 " 


m m 


(k) 


-1  r^OM 


A[1+*Sl  *\,t* 
1-1  tt\ 


m 

z 

i=l 


ai»  N(y.j  + §jk\ 

Vk^^Kl  + 0^.0^) 

N<Hi  + 21  -V1 

TTjCtW.  N(y.  + oj^a^z) 

-1 


m 


m 


z [1  + z exp{Q.  . (x) }] 
i=l  *=1  ' 

it  i 


_1  . [exp{L^J’k^(x)}]  < 1 , 


where 
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classified  into  by  the  Bayes'  rule  when  the  component  is  identified 
but  with  at  least  one  < 1,  are  not  necessarily  assigned  to  with 

the  component  information  unknown. 

3.  Simulation  Studies 

A simulation  experiment  was  conducted  to  investigate  the  robustness 
of  the  LDF  with  plug-in  estimates  under  moderate  non-normality.  Section 
3.1  describes  the  simulation  experiment.  The  Bayes'  rule  and  sample  LDF 
errors  are  described  using  measures  of  non-normality  in  Section  3.2, 
while  the  difference  in  their  classification  performances  is  studied  in 
Section  3.3.  Lastly,  the  performances  of  the  sample  LDF  using  maximum 
likelihood  estimates  and  Huber- type  estimates  are  compared  in  Section 
3.4. 

3.1  The  Simulation  Experiment 

The  simulation  experiment  to  investigate  the  robustness  of  the  sample 
LDF  to  non-normality  is  based  on  the  two-component  mixed-normal  distri- 
bution. The  classification  model  studied  was  the  canonical  form  of  the 
distribution  with  proportional  component  covariance  matrices.  The 
result,  due  to  Ashikaga  [3]  is 

f-j (x)  = (1  - a)  N(0,  I)  + aN(6,cr2I)  in  and 

= (1  “ a)  N(y,  I)  + aN(y+6,a2l)  in  II^, 

2 

where  0 < a < 1,  a > 1,  y = (y-j , 0,  ...,  0)',  and  0 = (e,,  ©£,  0,...,0)'. 
Table  2 lists  the  parametric  configurations  which  were  studied. 
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• TABLE  2 

Parameter  Values  Studied 

a = 0,  .1,  .2,  .3,  .3,  .5,  .6,  .7,  .8,  .9 

II 8 1 1 = ( E e?)*  = 0,  1,  2,  3 
i=l  1 

a2  = 1,  4,  9,  16 
A2  = 1 , 4 

The  LDF  was  studied  when  the  parameters  were  replaced  by  maximum  like- 
lihood estimates  as  in  Anderson  [1]  and  by  Huber-type  estimates  as  in 
Randles  and  others  [14]. 

The  robustness  criterion  was  the  difference  in  misclassification 
errors  between  the  LDF  with  plug-in  parameter  estimates  and  the  Bayes' 
rule  with  parameters  known.  The  LDF  misclassification  errors  were  com- 
puted from  100  repetitions  of  the  following  scheme: 

(1)  Draw  training  samples  of  size  n-j  from  and  n2  from  n2  and 
compute  the  LDF. 

(2)  Draw  an  index  sample  of  size  50  from  n-|  and  size  50  from  n2. 
Classify  the  index  samples  and  compute  the  average  misclassi- 
fication probability. 

The  Bayes'  rule  errors  were  also  computed  using  Monte  Carlo  procedures 

due  to  the  difficulty  of  the  numerical  computation. 

The  misclassification  errors  were  indexed  by  Mahalanobis  distance 
2 

between  populations  A and  measures  of  non-normality.  Two  measures 
introduced  by  Malakovich  and  Afifi  [12]  and  studied  by  Ashikaga  [3]  were 
multivariate  skewness 
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8*  = max  {8-j  (c'x) } 
c 

and  multivariate  kurtosis 

8*  = max  {[82(c'x)  - 3]2}  , 
c 

where  8-|  and  82  are  the  univariate  skewness  and  kurtosis  measures. 

3.2  Probabilities  of  Misclassification  vs.  8|  and . 8| 

Prior  to  looking  at  various  plots  of  differences  in  misclassification 
errors  between  the  Bayes'  rule  and  sample  LDF  classifiers,  it  is  helpful 
to  consider  the  relationship  between  the  actual  level  of  misclassifica- 
tion error  and  indictors  of  non-normality. 

The  overall  Bayes1  misclassification  errors  are  plotted  against  the 

2 2 

skewness  measure  8*  in  Figure  3.1  for  A = 1 and  Figure  3.2  for  A = 4. 

For  the  particular  mixed-normal  pdf's  under  study  the  largest  errors 

occured  when  the  pdf  was  synunetri cal . The  maximum  errors  decrease  as 

the  skewness  e*  rises  to  moderate  values  (3  to  4).  Representative 

graphs  of  the  overall  misclassification  error  for  the  class  of  LDF's 

with  plug-in  parameter  estimates  are  given  in  Figures  3.3  and  3.4  for 
2 2 

a = 1 and  A = 4,  respectively.  Here  the  LDF  was  estimated  by  Huber- 

type  estimators  as  in  Randles  and  others  [14].  The  training  samples  had 

25  observations  from  each  population.  These  graphs  are  similar  to  the 

plots  of  the  Bayes'  error,  except  that  the  maximum  errors  were  approxi- 

2 

mately  two  percent  larger  than  the  Bayes'  errors  at  A =1,  but  only  one 

2 

percent  larger  at  a =4.  While  the  graphs  for  the  Bayes'  errors  and  the 
sample  LDF  are  similar  for  the  largest  errors  at  various  levels  of 
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two  classifiers,  but  for  training  samples  of  size  25  from  each  popula- 
tion. No  differences  are  noted  from  the  situation  with  smaller  train- 
ing samples. 

Figures  3.13  through  3.16  plot  the  differences  between  the  sample 

LDF  and  Bayes'  classification  errors  when  the  parameters  of  the  LDF  were 

replaced  by  Huber- type  estimators.  Once  more  there  is  a drop  in  the 

2 

difference  between  the  errors  from  approximately  9 percent  at  a = 1 to 

2 

less  than  2.5  percent  at  A = 4.  As  with  the  maximum  likelihood  esti- 
mated LDF,  no  relationship  was  shown  between  skewness  and  the  difference 
in  overall  errors. 

The  differences  between  the  sample  LDF  with  maximum  likelihood  esti- 

★ 

mators  and  the  Bayes'  rule  errors  vs  ^ are  Plotted  in  Figures  3.17 

through  3.20.  There  appears  to  be  a decrease  in  the  largest  differences 

2 

for  higher  values  of  kurtosis  when  A =1.  These  differences  are  much 
o 

smaller  when  A = 4.  It  has  been  previously  shown  by  Ashikaga  [3]  that 
the  LDF  is  the  Bayes1  rule  for  scale-contaminated  mixed-normal  models. 
Thus,  for  this  sub-class  of  mixed-normal  models,  the  effects  of  kurtosis 
on  the  differences  between  the  sample  LDF  and  the  Bayes'  classifier 
errors  present  themselves  solely  through  the  plug-in  parameter  estima- 
tors. For  training  sample  sizes  of  25  from  each  population  those  models 

with  only  scale-contamination  exhibited  under  two  percent  difference 

2 

between  the  errors  of  the  two  procedures  at  A =1.  For  the  entire  group 
of  mixed-normal  models  studied,  the  difference  between  these  two  errors 
ranged  up  to  9 percent  when  kurtosis  was  0. 
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3.4  Sample  LDF  Misclassification  Errors  vs.  B-j*  and  b2* 

Figures  3.21  and  3.22  plot  the  differences  between  overall  mis- 
classification errors  when  the  parameters  of  the  LDF  are  replaced  by 
Huber- type  estimators  and  maximum  likelihood  estimators  versus  Here 
we  have  training  samples  of  size  25  from  each  population.  We  see  that 
the  largest  differences  between  these  two  plug-in  schemes  decrease  from 

2 2 

approximately  two  percent  for  A = 1 to  1.25  percent  at  a =4.  For 
? 

A = 1 the  largest  differences  in  the  two  errors  seem  to  shrink  as 

increases  but  is  based  on  relatively  few  pdf's  with  moderate  skewness. 

Similar  results  are  obtained  for  in  figures  3.23  and  3.24.  In  the 

sub-class  of  scale-contaminated  distributions,  the  difference  in  these 

2 

two  error  rates  was  under  0.5  percent  for  A = 1 and  0.3  percent  for 
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skewness,  these  plots  do  not  reveal  relationships  between  the  Bayes' 
errors  and  sample  LDF  errors  for  particular  distributions.  We  will  need 
to  examine  the  actual  differences  between  the  two  classification  schemes 
for  particular  distributions  in  order  to  study  the  robustness  of  the 
sample  LDF.  The  overall  mi sclassifi cation  errors  were  also  plotted 
against  the  kurtosis  coefficient  B|.  Figures  3.5  and  3.6  graph  the 
Bayes'  error  against  and  Figures  3.7  and  3.8  the  sample  LDF  with 
Huber- type  estimates  against  b2|.  The  largest  errors  occur  when  is 
near  zero  and  decrease  quickly  for  greater  than  five. 

A drawback  of  Malakovich  and  Afifi's  [12]  multivariate  kurtosis 
measure  is  that  the  linear  combination  of  x with  univariate  kurtosis 
most  different  from  3,  the  value  of  for  a univariate  normal  distribu- 
tion, can  correspond  to  either  a flat  or  peaked  distribution.  Reinspec- 
ting Figures  3.5  and  3.6,  the  points  with  largest  misclassification 
errors  (circled)  correspond  to  platykurtic  or  normal  pdfs. 

3.3  Differences  between  the  Sample  LDF  and  Bayes'  Errors 

for  $i*  and  62*- 

Figures  3.9  and  3.10  plot  the  differences  between  the  errors  for  the 

sample  LDF  and  the  Bayes'  classifiers  (P(Sample  LDF)  - P(Bayes)).  Here 

the  LDF  was  estimated  by  maximum  likelihood  from  training  samples  of 

size  15  from  each  population.  The  maximum  differences  between  miss- 

2 

classification  errors  were  approximately  9 percent  for  A = 1 and  dropped 

2 

to  less  than  2.5  percent  for  A =4.  Neither  graph  indicated  any  rela- 
tionship between  the  skewness  coefficient  and  the  difference  in  errors. 
Figures  3.11  and  3.12  also  plot  the  differences  between  errors  for  these 
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ABSTRACT 

It  is  suggested  that  using  a modified  analysis  of- variance 
procedure  on  data  sampled  systematically  from  a rectangular  array  of 
image  data  can  provide  a measure  of  homogeneity  of  means  over  that 
array  in  single  directions  and  how  variation  in  perpendicular 
directions  interact.  The  modification  of  analysis  of  variance 
required  to  account  for  spatial  correlation  is  described  theoretically 
and  numerically  on  simulated  data. 
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1.  Introduction 


Incorporating  spatial  correlation  into  the  analysis  of  multi- 
variate image  data  observed  in  the  plane  leads  to  massive  data 
management  and  computational  problems.  In  this  paper  we  describe  an 
initial  attempt  to  answer  problems  in  the  plane  by  sampling  the  data 
in  parallel  transects  so  that  one  need  only  consider  correlation  in 
one  direction.  Thus  given  a (K  x T)  array  of  d-dimensional 
observations,  divide  the  K rows  into  g groups  and  select  nn-  rows 
for  the  ith  group  so  that  rows  within  a group  are  essentially 
uncorrelated.  Then  the  correlation  within  rows  can  be  modeled  using 
ordinary  time  series  techniques  and  can  be  incorporated  in  an  analysis 
of  variance  procedure  in  analogy  with  that  for  long  repeated  measures 
designs. 

Let  yik  =(y]jk  »•••>  y{Tk  ) T be  a (Td  x 1)  random  vector 

representing  the  T d-dimensional  vectors  for  the  kth  observation  in 
the  ith  group  of  observations,  k=l,...,n. , i=l,...,g.  Assume 


(1)  *ijk  v “ij  + 2ijk 

where  the  n's  are  zero  mean  random  vectors  which  are  uncorrelated  for 
different  i's  and/or  k's  but  n^k's  having  the  same  j are  correlated. 
Thus  in  (196  x 117)  4-dimensional  image  data  one  might  let  g be 
between  3 and  5 and  the  n.'s  be  4 or  5.  In  this  paper  then  we 
visualize  analyzing  the  means  of  small  number  of  groups  of  time  series 
(here  the  "time"  index  j represents  position  within  a row,  i .e  the 
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East-West  location  of  an  observation). 

In  Section  2 we  consider  the  univariate  case,  j_j?  using  data  on 
only  one  channel  or  some  function  of  four  channels  at  each  location. 
Then  in  Section  3 we  discuss  possible  extensions  to  the  general 
d-dimensional  case. 

2.  Univariate  Long  Repeated  Measures  Analysis 

When  d is  one,  equation  (1)  appears  to  be  describing  a two-factor 
analysis  of  variance  model  with  the  factors  being  group  number  and  time 
(i ,e  column  index).  Such  data  is  often  called  repeated  measures  data 
since,  because  of  the  correlation,  one  can  think  of  y^  as  containing 
repeated  measurements  on  the  same  experimental  unit. 

There  are  three  basic  hypotheses  one  is  interested  in  testing; 

1)  equality  of  group  means  averaged  over  time  (denoted  Hq),  2) 
equality  of  time  means  averaged  over  group  (Hy),  and  3)  no  interaction 
between  group  and  time  means  (denoted  H^y),  i .e  the  graphs  of  the 
group  means  over  time  are  "parallel".  In  analyzing  image  data  we 
visualize  using  the  test  of  Hq  to  measure  homogeneity  in  the  North- 
South  direction,  Hy  to  measure  homogeneity  East-West,  and  Hgy  to 
measure  whether  variability  in  the  North-South  direction  is  constant 
over  East-West  location.  Also,  arrays  at  varying  locations  can  be 
fairly  quickly  classified  using  such  a procedure.  In  Table  1 we  list 
the  statistics  used  to  test  these  hypotheses  and  their  null 
distributions  in  the  case  of  no  correlation  within  rows.  We  then 
describe  how  these  tests  can  be  modified  to  account  for  correlation. 
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Hypothesis 


Table  1.  The  Usual  Two  Factor  ANOVA 

Test  Statistic  Null  Distribution 


SSr/,  MSP 

f = G/(g;l).  = 

1 n 


E SSEl/(N-g)  HSE1 


g-1  > N-g 


F - SST/(T-1) 


MSn 


T ' SSE./(T_1)(N_g)  " MSE2  T-l , (T-l ) (N-g) 


GT 


: = SSGT/(T-1 ) (q-1 ) = MSGT 

GT  SSE2/(T-l)(N-g)  MSE2 


'(T-l ) (g-1 ) , (T-l ) (N-g) 


N = l n.,  e.g.  y 
i=l  1 


n. 

i 


I y 


'*  i=l  k=l 


ijk/N 


9 - - ? 
ssg  = l ni  (yj  -y---)  . SSE, 
i=l 


g ni  _ 2 


T "i 


ssT  = l N(y  . -y...)  , SSE  = I l l (yiik-y1i  -y*  k+y-i  )‘ 

I j=1  .J.  e.  i=1  j=1  k=1  1JK  ij.  l.K  1.. 


ss 


GT 


g T 

= l l ni  (yn-i  -Yi  -y  i +y...)‘ 

i=l  j=l  1 ,J‘ 


To  incorporate  correlation  into  the  analysis,  we  let  ^ be  the 
(T  x T)  covariance  matrix  of  y^.  In  this  paper,  we  shall  assume  that 
| for  all  i and  k.  Thus  we  are  assuming  that  the  y^  are 
independent  NT(u^4)  random  variables  where  uJ  = (u.  -| , . . . .u^j) . The 
following  theorem  indicates  how  the  analysis  can  be  modified  when 
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I f o2Ij. 

Theorem  1 (Geisser  and  Greenhouse  [2]) 

a)  The  null  distribution  of  Fg  is  unaffected  by  correlation. 

b)  The  null  distributions  of  Fy  and  FQy  are  approximately 

FT^Fe(T- l.),e(T-l)(N-g)  , FGT^Fe(T-l }(g-l ) ,e(T-l ) (N-g) 

where  the  degrees  of  freedom  reduction  factor  e is  given  by 

c - [tr(flj)]2 
(T-l)tr(A$A$) 

where  A = Iy  - y ly  1 j and  ly  is  a T-vector  of  ones. 

c)  A lower  bound  for  e is  e > yyp  and  thus  conservative  (1-a) 
level  tests  for  Hy  and  Hgy  are  to  compare  Fy  and  F^y  to  Fa  -j  and 

Fa,g-l,N-g  respectively. 

Note  that  e can  be  written  as 


(T-l ) V X?(A*) 
i=l  1 


where  x-|(A  |)  >. ..>  Xy_ -j  (A^; ) are  the  T-l  nonzero  eigenvalues  of  the 
rank  T-l  matrix  A$.  Thus  from  (2)  it  is  easy  to  see  that  e = 1 (and 
using  the  F tests  with  no  degrees  of  freedom  reduction  for  correlation 
are  correct)  if  and  only  if  all  the  eigenvalues  of  A|  are  the  same. 

The  results  above  are  for  a general,  symmetric,  positive  definite 
matrix  $.  It  seems  clear  in  the  image  data  problem  that  it  is 


reasonable  to  assume  that  $ is  Toeplitz,  i ,e 


i . = Toepl  (a(0) , o(l ) ,. . . ,o(T-l )) , 


i .e  the  (j,k)th  element  of  $ is  a number  a(|j-k|).  Thus  we  are 
assuming  that  for  each  i,k,  »niTk  is  a sample  realization  from 

a covariance  stationary  time  series  having  autocovariance  function  o(*)- 

Two  questions  naturally  arise:  1)  Is  there  a higher  lower  bound 
for  e than  1/(T-1)  when  $ is  Toeplitz,  2)  Can  one  use  an  estimator 
of  E in  the  test  rather  than  routinely  performing  the  conservative 
test? 

Epsilon  for  Toeplitz  Matrices 

We  let  = Toepl  (a(0) , . . . ,a(T-l )) , Ay  = Iy  - j 1-j.lj  , and  also 
index  e with  a T,  i .e 


eT 


[tr(A  Ttj)f 
(T-l)tr(AT^TAT|T) 


While  there  appears  to  be  no  easily  written  lower  bound  for  in 
terms  of  series  length  T,  experience  with  a large  number  of  possible 
autocovariance  sequences  indicates  that  e-j-  rarely  falls  more  than  one 
or  two  percent  below  its  limit  as  T-*».  This  limit  is  given  in  the 
following  theorem. 

Theorem  2 (Spector  and  Newton  [7]) 

If  the  covariance  sequence  {a(v),  v=0,+l,...}  is  absolutely 


summable  then 
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1 im  _ g2(0) 

T-h»  eT  o> 

I cr2( v) 

V”  —CO 


I P2(v) 

V = -00 


where  p(v)  = a(v)/a(0)  is  the  autocorrelation  function  corresponding  to 
o( • ) . Further,  if  f(u)),  u)e[-Tr,iT],  is  the  spectral  density  function 
corresponding  to  a(*)  then 


1 im 


“1  2 


/ f (a))d 


103 


-IT 


2 IT  / f 2 ( UJ } da) 

-IT 


We  note  that  these  quantities  and  their  estimation  have  arisen 
elsewhere  in  time  series  analysis  (see  Parzen  [5],  p.  984  for  example). 

Suppose  ct( - ) is  the  autocovariance  sequence  of  a covariance 
stationary  autoregressive  process  of  order  p with  coefficients 
a = (a-j . . »ctp)^  and  residual  variance  a2(denoted  AR(p,a,o2)),  i .e 

f a-o(j-v)  = 5 a2  , v ^ 0 , 

j=0  3 V 

where  aQ  = 1 and  is  the  Kronecker  delta.  Then  for  p=l  and  p=2  we 
have  the  following  corollary. 

Corollary  1 

If  a(’)  corresponds  to  an  AR(1)  or  AR(2)  process  we  have 
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(1  - «|)[(1  + a2)2-  “*] 
a^(l-4a2+  a\)  + (1  + a2 )0  + “2^  * P 

In  Figure  1 we  graph  the  limiting  values  of  Cj  for  p=2  for  values 
of  a-|  and  that  make  the  process  stationary,  i .e  values  for  which  the 
zeros  of  1 + ct-|  z + agZ2  are  outside  the  unit  circle. 

For  example  if  T = 101  and  = -1 , = .4  then  1 / (T- 1 ) is  .01 

while  effectively  a lower  bound  for  £j  is  .28.  Thus  if  one  had  good 
estimators  of  aj  and  02  a much  less  conservative  test  of  Hy  and/or 
Hqj  could  be  determined. 

Using  an  Estimator  of  e 

We  consider  five  estimators  of  e. 

Each  consists  of  forming  an  estimator  of  $ from  the  N residual 
time  series  = (e^ lk»...  »eij|<)T  where 

eijk  ” yijk  " ^ij.  “ ^i.k  + ^i..* 

and  then  substituting  this  estimator  of  $ into  (2)  to  estimate  e. 

A 

1)  e - Ignoring  the  Toeplitz  form  of  $,  one  can  estimate  $ as  one 
would  in  ordinary  multivariate  analysis,  i .e 


lim  _ 

T-x»  T 
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Figure  1.  Limit  of  Epsilon  for  an  AR(2)  Process 


1 9 0 1 

8i  l l (eik 

" 1=1  k=l  ~1K 


(!ik 


where 


- /I  q 1 

e = (jj  l l j ^)yx-j  • This  is  the  traditional 

i “ 1 k-1 


estimator  (Huynh  and  Feldt  [3])  used  for  general 

* ( nn  ) 

2)  e'  p - Nonparametric  (i ,e  not  assuming  an  AR  model)  Pooled 
estimators  of  a(0),...,o(T-l)  of  a(0) . . ,c(T-l ) can  be  calculated  and 
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r 

c 


then  = Toepl  (cr(0) , . . . ,a(T-l ) ) . 

"(d) 

3)  e'H/  - Parametric  (i .e  assuming  an  AR  model)  pooled  estimators 
a(0) , . . . ,o(T*)  of  a(0) ,. . . ,a(T*)  can  be  calculated  and  = Toepl 
(o(0) , . . . ,a(T*) ) . The  integer  T*  >_  T-l . 

4)  e(°°’nP^  _ Nonparametric  limit  of  epsilon  estimator 

e(°°,np)  = o2(0) 

.1.  a2(v) 

|v|<T-l 

5)  " Parametric  limit  of  epsilon  estimator 


e(°°,p)  _ q2(0) 


* 


®2(v) 


To  compare  the  performance  of  these  estimators  in  terms  of  the 
size  of  the  test  of  H-j.  and  HGT  we  generated  100  sets  of  nine  zero  mean 
time  series  of  length  50  from  each  of  twenty  AR  processes.  These 
processes  were  chosen  to  present  a wide  range  of  time  series  types. 

In  each  set  the  nine  series  were  randomly  divided  into  three  groups 
of  three.  Thus  T=50,  g=3,  and  ni=n2=n3=3.  For  each  data  set,  the  . 
five  estimators  of  e were  calculated  and  for  a given  estimator  e*  the 
p- value  of  the  test  determined  (assuming  ) e*(T-l)(N-g) 

and  FGT'vFe*(T-l)(g-l),£*(T-l)(N-g))-  Now  if  the  test  using  e*  has 
the  correct  size  then  the  100  p-values  for  each  of  the  twenty  AR 

models  should  appear  to  be  a random  sample  of  size  100  from  a uniform 
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distribution  on  the  interval  zero  to  one  (Lehmann  [4],  p.  150).  In 
Table  2 we  list  the  results  of  testing  the  p-values  for  uniformity 
using  the  Cramer-von  Mises  statistic  as  well  as  descriptive 
statistics  for  five  estimators.  From  this  table  we  note: 

A 

1)  The  traditional  estimator  e is  woefully  inadequate  for  the 
types  of  data  we're  considering. 

2)  Using  G(°°»nP)  leads  to  a poor  test. 

3)  Any  of  e^np\  e^,  and  lead  to  tests  having  good  size. 

Studying  the  power  of  the  tests  of  Hy  and  H^y  numerically  is  of 

course  very  difficult  as  there  are  so  many  possible  alternatives.  To 
get  some  idea  of  the  power,  we  generated  100  sets  of  6 series  of 
length  50  (allocated  to  2 groups  of  3 series)  for  each  of  the  20  AR 
models  (these  are  the  n^'s)  and  then  formed 

yijk  ~ uij  + nijk 

where 

0 , i=l ,2,  jflO 

X , i=l , j=10 

for  x = 0,  2,  4,  6,  8,  10.  Thus  the  means  are  all  zero  except  the 
10th  observation  in  group  one  is  X.  In  Figure  2 we  give  a typical 

A ( n ) 

empirical  power  curve  again  showing  that  the  tests  using  eVK/ 
are  competitive  with  the  test  using  the  true  e. 


and 


) 1 1 1 '•)  I 1 


) I 1 1 1 


Table  2.  Results  of  Using  Five  e Estimators 

for  20  AR  Processes 

Order 

Coeffs  c,n  e e 

bO  » 

S<p> 

^Knp) 

=(-,p) 

s|  s^(np) 

1 f c- 

s5(p) 

s^Knp) 

s|Kp) 

CVMTa  CVMGT0  CVMT  CVMGT  CVMT  CVMGT  CVMT  CVMGT 

CVMT  CVMGT 

CVMTe  CVMGT 

1 

-.8 

.2403 

.2195  .0827 

.2246 

.3192 

.1289 

.3074 

.7x10" 4 

.0021 

.0028 

.0016 

.0031 

.077  .076 

1.45  .975 

.106 

.097 

.063  .175 

.708 

.424 

.054  .158 

1 

-.5 

.6127 

.6000  .1018  . 

.4923 

.6584 

.2786 

.6497 

. 3x1 0-4 

.0041 

.0045 

.0060 

.0048 

.881  .031 

3.13  2.09 

.935 

.043 

.910  .043 

1.39 

.446 

.908  .040 

1 

.5 

.6114 

.6000  .1009  , 

.4870 

.6151 

.2978 

.6019 

.3xl0"4 

.0037 

.0038 

.0058 

.0040 

.263  .197 

2.58  1.98 

.323 

.162 

.245  .183 

.764  .420 

.247  .176 

1 

.8 

.2331 

.2195  .0800  , 

.1971 

.2553 

.1283 

.2415 

.8x10'* 

.0021 

.0029 

.0014 

.0029 

.056  .090 

1.21  .681 

.098 

.033 

.075  .102 

.431  .198 

.067  .008 

2 

-.971 

.464 

.4688 

.4710  .0962  . 

.4x10"' 

.3676 

.4658 

.2227 

.4633 

.0011 

.0014 

.0023 

.0017 

.157  .186 

2.22  1.96 

.226 

.242 

.171  .192 

.640  .673 

.173  .192 

2 

.019 

.746 

.3085 

.2845  .0874  , 

,8xl0"4 

.2653 

.3477 

.1735 

.3237 

.0021 

.0027 

.0017 

.0028 

.143  .131 

1.73  1.76 

.220 

.250 

.140  .101 

.624  .657 

.153  .131 

2 

1.746 

.868 

.1436 

.1233  .0665  . 

.1280 

.1796 

.0889 

.1618 

.7x10"^ 

.0004 

.0005 

.0003 

.0006 

.111  .082 

.606  .641 

.088 

.106 

.206  .107 

.301  .320 

.135  .080 

Table  2 


Order  Coeffs 

e50 


2 

-1.84 

.861 

.0724 

.0808  .0482  „ 

.7xl0~4 

.060  .153 

.461  .560 

3 

-.690  - 

.771 

.1827 

.1862  .0723  , 

.612 

.7xl0"4 

.354  .099 

.896  .519 

3 

1.174 

.252 

.2616 

.2498  .0815  . 

-.121 

.6x10"^ 

.703  .078 

1.85  1.45 

3 

-1.404  1 

.188 

.4409 

.4383  .0949  , 

-.474 

.5xl0"4 

.118  .078 

2.08  1.40 

3 

-1 .227 

.0426 

.0646 

.0340  .0455  - 

.5106 

.9xl0“4 

.226  .087 

.821  .178 

4 

-.250 

.7287 

.4836 

.4602  .0965  . 

.0126 

.2951 

. 5xl0-4 

.037  .058 

2.39  1.93 

4 

-2.304  1.972 

.1123 

.1079  .0586 

-.7915 

.1724 

.133  .301 

.0001 

869  .432 

J . ) . 1 . I 1 ....  I .1  J 1 


ro 

UD 

O 


^(np)  ^(p)  *Knp)  *Kp) 


.0729 

.0002 

.072  .177 

.1111 

.0002 

.125  .106 

.0435 

.0001 

.705  .719 

.1164 

.0002 

.159  .121 

.1576 

.0006 

.311  .050 

.2216 

.0008 

.428  .207 

.0968 

.0007 

.534  .230 

.2219 

.0008 

.431  .209 

.2112 

.0007 

.726  .201 

.2722 

.0006 

.714  .076 

.1394 

.0010 

1.05  .640 

.2596 

.0006 

.705  .093 

.3495 

.0014 

.188  .056 

.4502 

.0011 

.124  .081 

.2131 

.0026 

.661  .310 
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Figure  2.  Empirical  Power  Curves  of  Tests  of  Hy  and  H 
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3.  Extension  to  Multivariate  Analysis  of  Variance 


The  extension  of  the  method  of  Section  2 to  the  case  where  y. 

1 J K 

is  a vector  rather  than  a scalar  is  not  obvious.  We  are  currently 
investigating  the  effect  of  having  correlation  across  the  levels  of 
one  factor  in  a two  factor  multivariate  analysis  of  variance  (MANOVA) 
as  this  is  how  the  correction  factor  e was  first  discovered  in  the 
univariate  case  (see  Box  [1]).  A promising  area  of  investigation  is 
to  note  that  the  distribution  of  a statistic  that  is  a transformation 
of  Wilk's  lambda  can  be  well  approximated  by  a random  variable 
having  an  F distribution  (see  Rao  [6],  p.  556). 
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ABSTRACT 

Synthetic  aperture  radar  images  are  degraded  by  speckle.  In 
this  paper,  we  present  a multiplicative  speckle  noise  model  for 
SAR  images.  Using  this  model,  we  derive  a Wiener  filter  by 
minimizing  the  mean-squared  error  using  the  known  speckle 
statistics.  Implementation  of  the  Wiener  filter  is  discussed  and 
experimental  results  are  presented.  We  conclude  with  a discussion 
of  possible  improvements  to  this  method. 
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Introduction 

Synthetic  aperture  radar  (SAR)  is  a coherent  imaging  system 
[1].  SAR  imagery  suffers  from  speckle  noise  degradation.  The 
speckle  noise  results  from  coherent  illumination  of  a rough 
surface  and  its  characteristics  are  well  known  [3].  In  the 
following,  we  derive  the  Wiener  filter  based  on  a multiplicative 
noise  model,  discuss  the  filter  implementation,  and  show  the 
experimental  results  of  the  filtering.  We  conclude  the  paper  by 
outlining  planned  future  work. 

Wiener  Filter 

The  speckle  noise  intensity  is  described  by  an  exponential 
probability  density  with  identical  mean  and  standard  deviation. 
In  order  to  use  the  speckle  statistics  in  reducing  the  noise  in 
SAR  intensity  images,  we  propose  the  following  signal  processing 
model . 


y(n  m)  = s(n  m)  d(n  m) 


(1) 


where  y(n  m)  = SAR  intensity  image 

s(n  m)  = scene 

d(n  m)  = speckle  noise 


The  probability  density  function  of  the  speckle  noise. 


d(n  m),  is 


j-D 


D > 0 


pdf  (D)  = 


0 


D < 0 


with  mean  = 1 and  variance  = 1.  The  mean  and  the  standard 
deviation  of  y(n  m)  are  equal  to  the  scene,  s(n  m). 

Using  (1),  we  design  a Wiener  filter  to  estimate  s(n  m)  given 
y(n  m).  The  Wiener  filter  is  the  optimal  linear  filter  in  the 
sense  that  it  minimizes  the  expected  value  of  the  mean  squared 
error  between  the  true  and  the  estimated  signals  [4].  The 
estimate  of  the  scene  is  denoted  s(n  m)  and  is  determined  by 
filtering  y(n  m)  such  that  s(n  m)  = h(n  m)  * y(n  m) 
where  h(n  m)  denotes  the  Wiener  filter  and  * indicates 
convolution.  In  frequency  domain, 

S(^  a^)  = H(t^  u£)  Y(u^  u£) 

where  capital  letters  denote  the  Fourier  transformed  functions. 

We  minimize  error  = E((s(n  m)  - $(n  m)r)  where  E(.)  denotes  the 
expected  value.  Using  the  orthogonality  principle,  which  states 
that  the  best  linear  estimate  is  obtained  if  the  error  between  the 
desired  and  estimated  is  uncorrelated  with  the  observations,  we 
have 

RyS(n  m)  = h(n  m)  * Ryy(n  m)  (2) 

where  Rys(n  = ^(y(l  k)  s(l-n  k-m)) 

Ryy(n  m)  = E (y (1  k)  y(l-n  k-m)) 
and  stationarity  is  assumed. 

Equation  (2)  is  the  Wiener-Hopf  equation  for  this  problem. 
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From  the  model  (1)  and  assuming  the  scene  and  the  noise  are 
uncorrelated,  (2)  becomes,  in  the  frequency  domain, 


where 


H(ul  (^)  = 


ps(“L 


Py^ 


Ps(^  <*fe) 
ps(“l  * Pd^ 


(3) 


Ps(^  w,)  is  the  power  density  spectrum  of  s(n  m) 

P^f^  w>)  is  the  power  density  spectrum  of  d(n  m) 
and  * denotes  convolution. 

The  Wiener  filter,  given  by  (3),  requires  the  knowledge  of  the 
power  density  spectra  (PDS's)  of  both  the  noise  and  the  scene.  We 
now  discuss  a method  of  determining  the  power  spectra  and 
implementing  of  the  filter. 

Impl ementation 

As  derived  by  Goodman  [3],  the  autocorrelation  function  of 
the  speckle  noise  is  the  sum  of  a constant  term  and  a function 
which  is  dependent  on  the  scattering  area.  We  assume  that  the 
scattering  area  is  such  that  the  PDS  of  the  noise  is  a bandlimited 
white  spectrum  with  an  impulse  at  DC  corresponding  to  a constant 
offset  in  the  correlation  domain.  Using  the  fact  that  half  of  the 
noise  power  is  contained  in  the  DC  component  and  half  at  other 
frequencies  [2],  we  have 


Pd(^1co2)  = 5(a)!  w2)  +-^jr 


(4) 


where  8 ojg)  = two-dimensional  impulse  function 
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and  lPd^l  w2^l  = 0 f°r  luil  >wi/2  and  1^1  > w2/2. 

Using  (4)  in  (3),  the  Wiener  filter  becomes 


ps(“l  w2> 


H(t01  “2)  = 


Ps(“l  uz) 


+ .. 


1 


W-J  W2w/^wr  /*■ 


P (<*>1  w2)  dwidw2 


and  using 


we  finally  have 
H((^  <^)  = 


yu,/z  UJi  ut/i 

J I PV(“I  “2)  dt^dfj^  = 2 If  P^Wj.  u^)  d^  do^ 

Ut/2 

V“l  “2>  V d“ld“2 


Py(“L  “2) 


(5) 


Equation  (5)  describes  the  Wiener  filter  to  be  implemented.  Note 
that  only  the  PDS  of  the  image  is  required. 

We  estimate  the  PDS  of  y(n  m)  by  averaging  its  periodograms.  If 
the  underlying  process  is  white  Gaussian,  the  variance  of  the 
averaged-periodogram  estimator  is  reduced  by  i/Vn  if  N 
peridograms  are  averaged  [6].  In  this  work,  we  average  four 
periodograms  to  estimate  w>)  and  determine  H(o^  w,). 

Because  the  filter  is  of  infinite  duration,  it  must  be 
truncated.  In  practice,  most  of  the  energy  is  concentrated  near 
the  origin  thus  truncation  does  not  cause  much  difficulty. 

In  practice,  the  Wiener  filter  of  (5)  is  approximated  using 
the  discrete  Fourier  transforms.  Using  the  averaged-periodogram 
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r 

I 


estimate  of  PDS  of  y(n  m),  denoted  by  Py ^ 1S  replaced  by 

Py<klk2)  - -V  £2Py(klk2> 

H(k1k2)  = _^ 2 N k| -o  tiz-o  y (6) 


where  N is  the  discrete  Fourier  transform  length. 

Figure  1 shows  the  algorithm  based  on  (6). 

Experimental  Results 

We  have  applied  the  signal  processing  algorithm  as  described 
in  Fig.  1 on  a SEASAT  SAR  image  of  an  agricultural  field  (SEASAT 
orbit  number  1355).  Figure  2(a)  is  the  given  intensity  image  and 
Fig.  2(b)  is  the  Wiener  filtered  image.  These  images  indicate 
that  filtering  reduces  the  speckle  noise  significantly. 

Figure  3(a)  is  the  Wiener  filter  in  the  frequency  domain  which  has 
the  low-pass  characteristic  since  the  data  is  basically  a low-pass 
signal  as  shown  in  3(b).  Figure  4 shows  the  slices  of  the  impulse 
response  of  the  truncated  filter  which  indicates  that  most  of  the 
energy  is  indeed  concentrated  near  the  origin. 

We  define  the  "equivalent  number  of  looks"  (ENL)  of  the  image 
by  ENL  = mean/standard  deviation.  The  ENL  of  an  area  with  uniform 
reflectivity  is  equal  to  1 because  of  the  exponential  probability 
density  function  of  speckle  noise.  For  the  filtered  image  of  Fig. 
2(b),  the  computed  ENL  is  approximately  2.2.  Figure  5 shows  the 
2-look  image  obtained  by  incoherent  averaging  of  the  image  [7]. 

By  comparing  Figures  2(b)  and  5,  we  conclude  that,  qualitatively, 
the  speckle  noise  of  the  filtered  image  is,  as  expected,  reduced 
to  that  of  the  2-look  image. 


Conclusions 


In  this  paper,  we  have  derived  the  Wiener  filter  for 
multipicati ve  speckle  noise  model  using  a priori  knowledge  of  the 
noise  PDS.  An  algorithm  for  implementation  of  the  Wiener  filter 
was  discussed.  The  results  of  Wiener  filtering  were  given  and 
compared  to  the  2-look  image.  The  Wiener  filtering  significantly 
reduced  the  speckle  noise. 

We  conclude  the  paper  by  outlining  three  extensions  of  the 
work  which  are  to  be  investigated.  First,  segmentation  of  the 
image  will  be  examined.  In  the  derivation  of  the  Wiener  filter, 
we  assumed  that  the  scene  was  stationary.  In  general,  the  scene 
is  not  stationary  and  by  segmenting  the  image  into  smaller  pieces, 
we  can  improve  the  "stationarity"  of  the  scene.  Second,  other  PDS 
estimators  will  be  examined.  In  the  implementation,  we  used  the 
averaged  periodogram  to  estimate  the  PDS's.  PDS  estimators  such 
as  MLM  or  MEM  [5],  which  have  better  resolution,  might  be  employed 
to  improve  the  estimate.  Third,  an  alternative  signal  processing 
model  which  includes  the  system  response  function  will  be 
examined.  By  using  (1),  we  assume  that  the  system  response 
function  of  the  imaging  system  is  an  impulse.  By  using  an 
alternate  signal  processing  model  which  includes  the  imaging 
system  response  function,  we  can  remove  the  effect  of  the 
imperfect  imaging  system. 


WIENER 

FILTER 
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Fig.  1 Wiener  filtering  algorithm 
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Fig.  2(b)  Filtered  image 
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Fig.  3(b)  Fourier  transform  of  original  image 
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Fig.  5 2-look  image 


References 


[1]  Elachi,  C.  , Bicknell,  T. , Jordan,  R.  L. , and  Wu,  C., 

Spaceborne  Synthetic-Aperature  Imaging  Radars:  Applications 

Techniques  and  Technology,  Proc.  IEEE  70  (1982)  1174-1209. 

[2]  Goldfinger,  A.  D.,  Estimation  of  Spectra  from  Speckled  Image, 
IEEE  Transaction  on  Aerospace  and  Electronic  Systems.  AES-18 
(1982)  675-681. 

[3]  Goodman,  J.W.,  Statistical  Properties  of  Laser  Speckle 
Pattern,  in  Daintz,  J.  C.  (ed.).  Laser  Speckle  (Springer- 
Verlay , NY,  1975). 

[4]  Kondo,  K.,  Ichioka,  Y.,  and  Suzuki,  T. , Image  Restoration  by 
Wiener  Filtering  in  the  Presence  of  Signal-dependent  Noise, 
Applied  Optics  16  (1977)  2254-2258. 

[5]  McClellan,  J.  H. , Multi-dimensional  Spectral  Estimation, 

Proc.  IEEE  70  (1982)  1029-1039. 

[6]  Oppenheim,  A.  V.,  and  Schafer,  R.  W.,  Digital  Signal 
Processing  (Prenti ss-Hall  Inc.,  Englewood  Cliffs,  N.  J., 

1975). 

[7]  Porcello,  L.  J.,  Massey,  N.  G.,  Innes,  R.  B.,  and  Marks,  J. 
M.,  Speckle  Reduction  in  Syntheti c-Aperature  Radars,  J.  Opt. 
Soc.  Am.  66(1976)  1305-1311. 


IMAGE  MATCHING  USING  GENERALIZED  HOUGH  TRANSFORMS 


Larry  S.  Davis 
Fu-pei  Hu 
Vincent  Hwang 
Les  Kitchen 


U..  M.  College  Park 


312 


ABSTRACT 

This  report  describes  an  image  matching  system  specifically  de- 
signed to  match  dissimilar  images.  A set  of  blobs  and  ribbons  is 
first  extracted  from  each  image,  and  then  generalized  Hough  transform 
techniques  are  used  to  match  these  sets  and  compute  the  transforma- 
tion that  best  registers  the  image.  An  example  of  the  application  of 
the  approach  to  one  pair  of  remotely  sensed  images  is  presented. 
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This  report  describes  progress  to  date  on  our  research  into  the 
problem  of  matching  "dissimilar"  images.  The  dissimilarity  may  be  due 
to  significant  changes  in  the  scene  being  imaged  or  to  the  utilization 
of  somewhat  different  sensors  to  image  the  same  scene.  In  either  event, 
one  cannot  expect  to  be  able  to  match,  or  register,  such  images  using 
conventional  image  registration  techniques  based  on  either  direct  inten- 
sity cross  correlation  or  even  on  somewhat  more  sophisticated  feature 
(e.g.,  edge)  correlation  techniques.  Instead,  we  suggest  that  the  images 
to  be  matched  be  subjected  to  a rather  complex  analysis  in  order  to 
construct  descriptions  of  the  images  in  terms  of  relatively  high  level 
pieces  (in  the  examples  shown  in  this  paper  the  pieces  are  blobs  and 
ribbons).  These  pieces  can,  in  principle,  be  interpreted  in  the  context 
of  a model  for  the  classes  of  entities  that  are  likely  to  appear  in 
the  images,  and  it  is  the  resulting  symbolic  descriptions  which  are 
matched  to  register  the  images.  This  interpretation  step  is  not  dis- 
cussed in  this  paper,  but  is  a topic  currently  under  investigation  in 
our  laboratory.  Related  work  on  symbolic  image  matching  appears  in 
Price  [5]. 

Blob  and  Ribbon  Detection 

In  an  image,  blobs  and  ribbons  extracted  usually  correspond  to 
interesting  objects  . For  example,  in  aerial  imagery,  blobs  extracted 
may  correspond  to  houses  and  ribbons  may  correspond  to  roads.  What 
follows  is  a description  of  algorithms  for  blob  and  ribbon  extraction. 
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Blob  Extraction: 

A blob  is  a compact  homogeneous  region.  In  order  to  extract  blobs, 
we  first  segment  the  image  into  homogeneous  regions.  Then  we  compute 
the  properties  of  each  homogeneous  region  and  extract  those  regions  which 
satisfy  the  blob  criteria. 

To  segment  the  image  into  homogeneous  regions,  we  first  convolve 
the  image  with  a Laplacian  operator.  The  places  where  the  convolved 
result  changes  sign  correspond  to  the  locations  of  intensity  changes  in 
the  original  image  [3].  If  we  assume  the  intensity  of  the  regions  to 
be  extracted  is  lighter  than  the  intensity  of  the  background,  the  re- 
gions to  be  extracted  are  those  regions  with  positive  value  in  the 
convolved  image. 

The  scale  of  the  Laplacian  operator  determines  the  scale  of  the 
positive  regions  in  the  convolved  image.  If  we  know  the  scale  of  the 
blobs  we  want  to  extract,  we  can  select  a Laplacian  operator  with  the 
appropriate  scale. 

In  our  method,  the  Laplacian  operator  is  a difference  of  averages  be- 
tween two  square  windows;  the  Laplacian's  scale  is  specified  by  the 
sizes  of  the  two  windows  used.  Uniform  weight  is  assigned  to  every 
point  in  the  mask. 

After  the  positive  regions  in  the  convolved  image  are  extracted, 
we  need  to  compute  their  properties.  Assume  the  size  of  a region  is 
A and  its  perimeter  is  P.  The  compactness  of  the  region  is  defined  as: 

P*P 

compactness  = -v- 


In  the  experiment  presented  in  the  next  section,  18  is  used  as  the 
upper  bound  on  the  compactness  of  regions.  All  regions  with  com- 
pactness smaller  than  18  are  considered  to  be  compact.  The  value  18 
is  obtained  by  computing  the  compactness  measurement  for  a rectangle 
whose  length  is  twice  as  long  as  its  width. 

All  the  regions  which  satisfy  the  compactness  criterion  are  blobs. 
However,  since  we  apply  a large  scale  Laplacian  operator  to  the  image, 
there  may  be  some  artifacts  in  the  convolved  image.  For  example,  two 
separated  compact  regions  in  the  image  may  be  merged  into  a connected 
positive  region  in  the  convolved  image.  The  merged  region  is  usually 
not  compact.  To  recover  from  such  artifacts,  we  apply  an  8-connected 
shrinking  operation  to  the  convolved  image.  This  may  break  some  re- 
gions into  several  smaller  regions.  All  the  newly  generated  regions 
which  satisfy  the  compactness  criterion  are  also  blobs. 

Ribbon  Extraction: 

A ribbon  is  an  elongated  homogeneous  region.  As  discussed  above, 
we  can  extract  homogeneous  regions  by  an  edge  detection  operation.  We 
need  to  decompose  these  regions  into  subregions  which  are  elongated 
and  whose  width  along  the  skeleton  of  the  region  is  some  constant.  In 
the  following,  the  term  "ribbon"  refers  to  a constant  width  ribbon 
with  some  minimal  length. 

In  our  method,  we  first  apply  a topology  preserving  8-connected 
thinning  operation  [6]  to  the  convolved  image.  This  operation  pro- 
duces the  skeleton  map  of  the  regions  in  the  convolved  image.  We  want 
to  decompose  the  skeleton  into  line  segments  such  that  all  points  on 
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the  same  line  segment  have  nearly  the  same  distance  to  their  nearest 
background  points. 

A branch  point  is  a point  on  the  skeleton  map  which  is  adjacent 
to  at  least  three  different  skeleton  points.  After  we  compute  the 
skeleton  of  a region,  we  delete  all  branch  points  on  the  skeleton. 

For  each  connected  (8-connected)  line  segment  in  the  resulting 
skeleton  map,  we  computed  the  ideal  width  for  it  by  histogramming 
the  widths  along  the  skeleton  and  choosing  the  most  frequently  en- 
countered width. 

The  ideal  width  of  a skeleton  line  segment  is  used  to  determine 
whether  a point  on  the  skeleton  line  segment  is  part  of  the  skeleton 
of  some  ribbon.  Suppose  the  ideal  width  of  a skeleton  line  segment 
is  w.  A point  P on  the  line  segment  is  on  the  skeleton  of  some  ribbon 
(i.e~. , is  a ribbon  point)  iff: 

w-e  < width  at  P < w+e 

Long,  connected  sets  of  ribbon  points  constitute  ribbons.  In  the 
experiment  described  in  the  next  section,  only  blobs  are  used  to  com- 
pute the  registration;  we  are  currently  extending  our  registration 
system  to  include  ribbons. 

Image  Matching 

Once  a description  of  the  ribbons  and  blobs  in  two  images  has  been 
computed,  these  descriptions  can  be  used  to  match  the  two  images  using 
Generalized  Hough  Transforms  (GHTs).  The  GHT  is  a generalization  of 
the  classical  Hough  transform  algorithms  which  were  used  to  detect 
simple  shapes  such  as  lines,  circles  and  ellipses  in  images  (Ballard 
[1],  Yam  and  Davis  [9]  describe  the  generalizations). 
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The  GHT  can  be  simply  illustrated  by  considering  the  problem  of 
matching  point  patterns  under  simple  transformations.  Let  P=[{pl,..., 
pn}  be  one  set  of  points  in  the  plane  (P  might  correspond  to  the  loca- 
tions of  features  in  one  of  two  images  that  we  are  attempting  to  re- 
gister) and  let  Q={ql ,. . . ,qm}  be  the  second  of  the  two  point  patterns 
(Q  might  be  the  locations  of  features  in  a small  window  of  the  second 
image).  The  problem  is  to  determine  if  Q matches  well  against  a sub- 
set of  P with  respect  to  a given  set  of  point  transformations  (such 
as  translations  and  rotations).  One  straightforward  way  of  deter- 
mining how  well  Q matches  P is  to  apply  the  transformations,  one  at 
a time,  to  Q and,  for  each  transformation,  count  how  many  points  from 
Q are  mapped  onto  points  in  P.  In  practice,  there  are  only  a finite 
number  of  transformations  because  of  the  bounded  size  of  the  images 
from  which  P and  Q are  extracted,  and  the  limited  precision  to  which 
we  represent  the  positions  of  the  points  in  P and  Q.  We  should  point 
out  that  simple  binary  correlation  algorithms  for  matching  under  trans- 
lation work  exactly  in  this  way  since  they  slide  an  image  containing  Q 
over  all  positions  in  the  image  containing  P.  If  T is  the  number  of 
possible  transformations,  then  this  straightforward  algorithm  requires 
time  proportional  to  Tmn. 

This  turn  out,  however,  not  to  be  the  computationally  most  effi- 
cient way  to  match  Q and  P.  If  we  are  able  to  commit  extra  storage, 
then  we  can  dramatically  cut  down  on  the  amount  of  computation.  The 
storage  required  is  proportional  to  the  number  of  possible  transforma- 
tions (although  later  we  will  briefly  discuss  methods  which  often  re- 
duce the  amount  of  storage  required).  One  needs  to  construct  an 


array  of  accumulators,  with  one  accumulator  for  each  of  the  possible 


transformations.  After  the  GHT  algorithm  operates,  the  value  stored 
in  any  of  these  accumlators  will  be  the  number  of  points  in  Q mapped 
onto  (or,  more  accurately,  tolerably  close  to)  some  point  in  P by  the 
transformation  represented  by  that  accumulator.  Consider  the  special 
case  now  where  T contains  only  translations.  Let  HT  be  the  array  of 
accumulators.  Then  the  GHT  algorithm  is: 

For  each  point  q = (xq,yq)  in  Q 
For  each  point  p = (xp,yp)  in  P 
Let  dx  = xp-xq 
Let  dy  = yp-yq 
HT(dx,dy)  = HT(dx,dy)  + 1 

In  this  simple  case,  the  comparison  of  a point  in  P with  a 
point  in  Q results  in  incrementing  only  a single  accumulator  in  the 
array  HT.  This  is  because,  of  course,  only  two  points  are  needed  to 
completely  determine  the  transformation.  More  generally,  however, 
comparing  a single  point  in  P with  a single  point  in  Q will  not  spe- 
cify a unique  transformation,  but  will  rather  specify  a family  of 
transformations  corresponding  to  some  subspace  of  the  space  of  trans- 
formations represented  by  the  array  HT.  One  can  ordinarily  cut  down 
on  the  size  of  this  subspace  by  comparing,  e.g.,  pairs  of  points  from 
P against  pairs  of  points  from  Q.  However,  unless  one  can  introduce 
some  heuristics  to  limit  the  number  of  such  pairs  (or,  more  generally, 
triples,  quadruples,  etc.)  such  an  approach  quickly  becomes  computa- 
tionally unfeasible. 
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Consider  next  the  slightly  more  complicated  situation  where  T 
consists  of  not  only  translations,  but  image  plane  rotations  as  well. 
Now,  the  array  HT  is  a three  dimensional  array,  the  third  dimension 
needed  to  represent  the  rotation  parameter.  The  GHT  algorithm  in 
this  case  is : 

For  each  q = (xq,yq)  in  Q 
For  each  p = (xp,yp)  in  P 
For  r = 0,  2tt,  by  dr 
xq'  = xq  cosr 
yq'  = yq  si nr 
dx  = xq'-xp 
dy  = yq'-yp 

HT(dx,dy,r)  = HT(dx,dy,r)  + 1 

Here,  we  first  apply  a rotation  to  point  q and  then  determine 
the  unique  translation  that  will  map  the  rotated  version  of  q onto  p. 
Notice  that  it  would  not  have  been  appropriate  to  have  fixed,  e.g., 
dx  and  then  attempted  to  determine  a dy  and  r which  would  map  q onto 
p since  for  most  dx  no  such  dy  and  r would  exist.  We  should  also  point 
out  that  the  values  of  r,  dx  and  dy  computed  by  the  above  algorithms 
have  to  be  subjected  to  some  truncation  so  that  they  can  be  associated 
with  an  accumulator  in  HT. 

The  above  algorithm  can  be  easily  adapted  to  matching  pairs  of 
blob  patterns.  We  associate  a position  (e.g.,  the  centroid)  with 
each  blob,  and  then  the  remaining  attributes  of  the  blob  (e.g.,  size, 
orientation,  compactness)  can  be  used  both  to  limit  the  pairs  of  blobs 
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which  are  considered  as  possible  matches,  and  to  bound  the  possible 
transformations  which  can  relate  the  blobs.  For  example,  one  can  com- 
bine the  compactness  and  orientations  of  two  blobs  to  limit  the  number 
of  rotations  which  need  be  considered  when  the  two  blobs  are  compared. 
Intuitively,  if  both  blobs  are  very  compact  (i.e.,  nearly  round) 
then  one  cannot  place  too  much  confidence  in  the  estimate  of  orienta- 
tion of  the  blob  so  that  perhaps  all  possible  rotations  must  be  con- 
sidered. On  the  other  hand,  if  both  blobs  are  relatively  elongated, 
then  one  might  only  consider  a small  set  of  rotation  angles  centered 
around  the  orientational  difference  of  the  axes  of  the  two  blobs. 

We  now  turn  to  the  problem  of  representing  the  space  of  trans- 
formations. The  most  straightforward  representation  is  to  construct 
an  n-dimensional  array,  one  dimension  for  each  parameter  in  the  set  of 
transformations.  While  this  is  reasonable  for  low  dimensional  trans- 
formations (such  as  translations),  it  is  not  a reasonable  approach  for 
higher  dimensional  transformations.  We  can  identify  at  least  three  al- 
ternative approaches  to  direct  representation  of  the  higher  dimensional 
array. 

1)  Multi  resolution  - initially,  use  a very  coarsely  quantized 
high  dimensional  array  (for  example,  for  rotations  and  trans- 
lations we  might  initially  quantize  the  translation  parameters 
to  every  10-20  pixels  and  the  rotation  parameter  to  every  10- 
20  degrees).  This  will  make  the  size  of  the  higher  dimensional 
array  manageable.  Compute  the  GHT  using  this  coarse  represen- 
tation, and  find  the  most  likely  transformation(s) . Using  the 


same  storage,  compute  a second  GHT,  but  with  the  range  of 
the  parameters  now  restricted  by  the  coarse  match.  This 
approach  was  used  by  Stockman  [8], 

2)  Projections  - Compute  various  projections  of  the  high  di- 
mensional array,  and  search  for  consistent  and  highly 
likely  transformations  in  the  projections.  For  example, 
if  the  set  of  transformations  includes  translations  (dx, 
dy)  and  image  plane  rotations  (r),  then  we  can  compute 
the  (dx,r)  and  (dy,r)  projections  of  the  three-dimensional 
(dx,dy,r)  parameter  space,  and  choose  the  peaks  from  (dx, 
r)  and  (dy,r)  that  agree  on  the  rotation.  This  is  the 
approach  used  in  the  experiments  presented  in  the  next 
section. 

3)  Adaptive  quantization  - Several  data  structure  have  been 
proposed  which  essentially  provide  a form  of  adaptive 
quantization  for  representing  data  distributions  in  high 
dimensional  sapces.  These  data  structures  are  based  on 

a recursive  decomposition  of  the  space  into  pieces;  by  at- 
tempting to  equalize  the  probability  that  a data  point 
falls  into  any  element  of  the  decomposition,  parts  of  the 
space  that  have  higher  density  of  data  points  are  rep- 
resented at  higher  resolution.  Examples  of  such  data 
structures  are  Sloan  [7]  and  O'Rourke  [4].  In  the 
former,  the  decomposition  is  regular  (i.e.,  subspaces  are 
split  in  "half"  at  each  stage  of  the  decomposition),  while 
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the  latter  constructs  an  irregular  decomposition. 

It  is  also  possible  to  utilize  the  GHT  algorithm  to  match 
images  based  on  descriptions  of  the  ribbons  that  appear  in  the 
images.  In  Davis  [2],  we  described  a GHT  algorithm  for  matching 
patterns  of  geometric  entities,  such  as  straight  line  and  circular 
arc  segments.  This  algorithm  can  be"easily  adapted  to  the  case 
where  the  segments  have  additional  properties,  such  as  the  width 
property  that  is  associated  with  ribbons. 

Experimental  Results 

We  have  applied  the  GHT  matching  algorithm  to  blob  representations 
of  several  image  pairs;  in  this  section  we  will  present  the  results  of 
one  such  experiment.  Figures  1-2  contain  two  images  from  a pair  of  aeri- 
al photographs  of  a suburban  area.  Figure  2a  contains  just  that  part 
of  the  second  photograph  that  we  will  match  against  Figure  la.  Figures  lb  and 
2b  show  the  blobs  detected  by  the  algorithm  described  in  Section  2,  and 
Tables  1-2  contain  descriptions  of  the  blobs  (position,  orientation  of 

principal  axis,  size,  and  compactness)  extracted  from  the  two  images. 

The  GHT  algorithm  assumed  that  the  matching  transformation  con- 
sisted of  an  image  plane  translation  and  rotation,  so  that  the  Hough 
transform  is  a three-dimensional  space.  We  adopted  the  strategy  of  com- 
puting only  projections  of  the  three-dimensional  space,  and  chose  the 
(dx,r)  and  (dy,r)  projections.  The  projected  Hough  transforms  are 
displayed  in  Figure  3.  The  registration  accuracy  is  correct  to  one 
pixel  in  translation  and  2°  in  rotation. 
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(a)  (b) 

Figure  2.  Section  of  frame  2 (a)  and  extracted  blobs 
(b) 


Figure  1.  Frame  1 (a)  and  extracted  blobs  (b). 


326 


ANALYSIS  OF  SUBPIXEL  REGISTRATION 
ACCURACY 


David  Lavine 
L.N.K.  Corporation 


Laveen  N.  Kanal 
L.N.K.  Corporation 


Carlos  A.  Berenstein 
University  of  Maryland 
L.N.K.  Corporation 


Eric  Slud 

University  of  Maryland 
L.N.K.  Corporation 


Charles  Herman 
L.N.K.  Corporation 


TABLE  OF  CONTENTS 


Section  1.0  INTRODUCTION 

SECTION  2.0  GEOMETRIC  ACCURACY 

2.1  DIGITAL  STRAIGHT  LINE  SEGMENT  PARAMETER  ESTIMATION 

2.2  FEASIBLE  REGION  SHAPE 

2.3  INFINITE  DIGITAL  LINES 

2.4  INVARIANT  LINE  MEASURE 

2.5  DIGITAL  LINE  - PROBABILISTIC  ANALYSIS 

2.6  DIGITAL  LINES  - POINTS  MISSING 

2.7  DIGITAL LINES  - POINTS  MISSING,  POINTS  ADDED 

SECTION  3.0  SUBPIXEL  TRANSLATION  - REGISTRATION  OF  STATIONARY  RANDON 
FIELDS 

3.1  NEIGHBORHOOD  - CONSISTENCY  OF  MAXIMUM  - CORRELATION 
ESTIMATION 

3.2  INTERPOLATION  USING  PIXEL-DISCRETIZED  IMAGES 

3.3  SUMMARY  AND  PROPOSED  NUMERICAL  EXPERIENTS 

SECTION  4.0  MAXIMUM  LIKELIHOOD  CORNER  DETECTION 

4.1  THE  MODEL 

4.2  RESULTS 

4.3  CONCLUSIONS 

4.4  INTERPOLATION  EXPERIMENTS 

SECTION  5.0  COMPARISION  OF  CORRELATION,  LSE  AND  MLE  FOR  IMAGE  MATCHING 

5.1  CORRELATION  AND  LSE 

5.2  LSE  AND  MLE 

SECTION  6.0  CONCLUSIONS  AND  FUTURE  WORK 


REFERENCES 


329 


ABSTRACT 

Geometric  and  probabilistic  models  for  subpixel  accuracy  are 
developed.  The  geometric  models  bound  the  error  in  offset  estimation 
using  the  pixels  in  an  observed  digital  straight  line.  One  probabilis- 
tic model  bounds  the  estimate  of  error  offset  for  continuous  images. 

The  other  model  bounds  the  error  for  discrete  images  given  that  one  is 


in  the  correct  pixel. 
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NOTATION 


L x J - greatest  integer  ^ x 

[ x 1 - least  integer  ^ x 

(m,n)  - greatest  common  divisor  of  m and  n 

L(a,b)  - line  joining  points  a and  b 

<j>  (n)  - Euler  totient  function  - the  number  of  positive  integers 

less  than  or  equal  to  n which  are  relatively  prime  to  n 

p(n)  is  the  Moebius  function  defined  as  follows: 

y(l)  = 1; 

®1 

if  n>l,  let  n = p,  ,...,p  be  the  prime 

JL  K 

decomposition  of  n.  Then 

U(n)  = (-1)K  if  = a2  = •*  = aK  = 1 


y(n)  = 0 otherwise 


Section  1.0  Introduction 


The  problem  of  aligning  a sensed  image  to  a reference  image  to 
less  than  a pixel  accuracy  has  received  considerable  attention  in  recent 
years,  but  no  theoretical  basis  for  these  efforts  has  been  established. 
This  report  describes  our  work  in  the  development  of  models  for  the 
analysis  of  subpixel  accuracy.  We  have  pursued  several  independent 
avenues  of  research  in  this  initial  study.  These  analyses  will  be  com- 
bined in  the  coming  year  to  provide  a more  complete  analysis  of  the 
problem. 

Two  complementary  approaches  to  the  subpixel  registration  problem 
were  undertaken  in  this  study.  The  first  approach  has  a deterministic 
geometric  orientation,  while  the  second  is  primarily  statistical.  In 
the  first  approach,  we  assume  an  approximate  registration  of  a sensed 
image  containing  a linear  feature  to  a reference  image  is  available. 

Using  the  location  of  the  observed  pixels  and  the  information  that  the 
corresponding  reference  feature  is  straight,  we  derive  bounds  on  the 
accuracy  to  which  the  reference  and  sensed  image  can  be  matched.  These 
error  bounds  are  related  to  the  properties  of  the  feature,  such  as  its 
length  and  angle.  These  relationships  can  then  be  used  to  establish 
criteria  for  the  selection  of  good  reference  images.  In  our  most  re- 
strictive model  we  find  that  subpixel  accuracy  is  readily  achievable. 

As  we  examine  less  restrictive  models  in  the  continuation  of  this  work 
we  hope  to  achieve  more  realistic  bounds. 

This  report  focusses  on  modeling  of  the  subpixel  registration  to 
obtain  bounds  on  registration  accuracy  and  to  develop  model  based  methods. 


333 

Consequently,  we  generally  refer  to  previous  subpixel  algorithms,  only 
when  they  are  relevant  to  the  modeling  and  analysis  problems.  A previous 
survey  of  subpixel  methods  [Ka]  ultimately  led  to  the  present  study. 

The  current  study  consists  of  three  main  segments.  First,  we 
studied  the  registration  accuracy  which  could  be  achieved  by  matching 
geometric  figures,  such  as  straight  lines,  between  images.  This  work, 
described  in  Section  2,  assumes  the  geometric  figure  has  been  extracted 
from  the  sensed  image,  and  is  known  to  lie  in  the  reference  image.  The 
essence  of  the  approach  is  that  a slight  shift  in  a real  world  edge  can 
cause  a substantial  change  in  the  digitization  of  that  edge.  We  propose 
three  progressively,  more  realistic  models.  The  first  model  was  analyzed 
and  it  was  shown  that  a high  degree  of  subpixel  accuracy  can  be  attained 
under  the  assumptions  of  the  model.  Future  work  will  deal  with  the  less 
restrictive  forms  of  this  model. 

The  second  segment  of  our  study  develops  bounds  on  subpixel  registra- 
tion accuracy  using  statistical  bounds.  Two  cases  are  considered, 
matching  of  continuous  images  and  matching  of  discrete  images.  In  the 
continuous  case  we  derive  bounds  on  registration  accuracy,  while  in  the 
discrete  case  we  derive  bounds  on  subpixel  accuracy  given  that  we  are 
on  the  current  pixel. 

The  third  part  of  our  study  dealt  with  the  problem  of  maximum  like- 
lihood based  estimation  of  the  registration  offset.  Since  the  first 
two  phases  of  the  work  assumed  pixel  registration  was  available,  we  felt 
it  necessary  to  examine  the  credibility  of  this  assumption.  A maximum 
likelihood  procedure  was  developed  for  estimating  the  location  of  a 
corner  such  as  a field  boundary  in  an  image.  Interpolation  of  the 
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correlation  function  did  not  prove  to  be  useful  in  synthetic  imagery 
from  this  model,  though  this  work  is  in  a preliminary  stage. 

Maximum  likelihood,  correlation  and  least  squares  are  all  used  in 
image  matching.  Confusion  as  to  the  interrelationships  between  these 
methods  pervades  the  literature.  We  have  included  a section  describing 
work  in  which  we  establish  conditions  under  which  these  methods  are 
equivalent. 

We  have  developed  both  geometric  and  stochastic  models  for  subpixel 
accuracy.  Under  restrictive  model  assumptions,  the  geometric  method 
leads  to  bounds  on  subpixel  accuracy.  The  statistical  modeling  has  lead 
to  error  bounds  which  will  be  examined  in  experimentation  in  the  continua- 
tion of  this  work.  There  will  also  be  a fusion  of  parts  of  the  geometric 
and  stochastic  modeling.  We  think  this  initial  work  has  provided  useful 
models  and  opened  up  many  paths  for  continued  exploration  into  progres- 
sively more  realistic  models  for  subpixel  accuracy. 
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Section  2.0  Geometric  Accuracy 

Matching  edges  in  sensed  and  reference  images  can  be  used  for 
registration.  The  degree  to  which  the  position  of  a real-world  edge, 
such  as  a field  boundary,  can  be  located  in  imagery  depends  heavily  upon 
ones  knowledge  of  the  scene  and  the  sensors.  Edge  detectors  can  be  used 
to  locate  reasonable  candidates  for  edge  points  and  then  an  edge  can  be 
more  precisely  fit  using  these  points.  Alternatively,  an  estimate  of 
subpixel  edge  location  can  be  formed  directly  from  the  grey  levels  [Hy  -Ba] 
Hybrid  approaches  may  also  be  adapted.  In  this  section,  we  study  the 
accuracy  attainable  using  the  first  of  these  approaches,  which  we  call  the 
geometric  accuracy  approach. 

Before  launching  into  a description  of  our  models  for  geometric  accur- 
acy, it  is  useful  to  consider  those  aspects  of  the  registration  problem 
we  wish  to  capture  in  our  models.  The  heart  of  our  approach  is  to  estimate 
the  position  of  an  image  edge  to  subpixel  accuracy  and  use  this  information 
to  define  a translation  between  the  sensed  and  the  reference  image.  In 
the  ideal  case,  the  grey  levels  on  each  side  of  the  edge  are  constant  off 
the  edge  pixels  and  the  edge  pixel  grey  levels  are  a simple  weighted  aver- 
age of  these  two  grey  levels.  If  all  grey  levels  are  possible  and  the 
edge  pixels  are  all  known  then  the  position  of  the  edge  can  be  exactly 
determined..  Such  a situation  is  clearly  unrealistic  but  it  serves  as  a 
starting  point  for  approximation. 

Most  current  methods  for  attaining  subpixel  accuracy  employ  some  type 
of  interpolation  of  the  correlation  fucntion.  If  such  a method  is  to 
achieve  subpixel  accuracy,  the  digital  correlation  function  must  be  able 
to  achieve  pixel  accuracy.  In  our  work,  we  assume  pixel  accuracy  is  avail- 
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able  either  through  correlation  or  other  methods.  Thus,  in  the  simple 
case  of  a one-dimensional  shift  any  real  world  point  can  be  determined  to 
lie  within  a 3x1  pixel  strip.  Our  results  can  be  improved  drastically  if 
we  assume  we  know,  from  registration,  we  are  in  the  correct  pixel,  but 
this  is  a highly  unrealistic  assumption. 

The  analysis  described  in  this  section  pertains  to  the  problem  of 
one-dimensional  translations.  This  is  not  particularly  restrictive  since 
the  two-dimensional  problem  can  be  easily  decomposed  into  one-dimensional 
shift  estimates.  In  the  line  location  estimation  problem,  we  are  trying 
to  locate  a real  world  line  y = mx  + b in  the  image.  A shift  (Ax, Ay) 
between  real  world  and  image  coordinates  yields  a line  y = m(x  - Ax)  + 
b + Ay  in  the  image.  This  may  be  written  as  y = mx  + b + (Ay  - mAx) 
which  is  the  original  line  shifted  only  in  the  y direction  and  by  an  amount 
Ay  - mAx.  Our  1-d  estimation  procedures  enable  us  to  estimate  Ay  - mAx. 
Given  two  lines,  we  can  solve  (possibly  using  least  squares)  for  Ax  and 
Ay  separately.  From  this  point  on,  we  will  confine  ourselves  to  1-d 
shifts . 

The  models  described  in  this  chapter  assume  a set  of  pixels  labelled 
edge  pixels  are  provided  by  an  edge  detection  procedure.  Three  cases 
are  considered.  First,  the  set  of  edge  pixels  are  exactly  the  digital 
edge  corresponding  to  a line  in  the  real  world.  This  model  is  unduly 
restrictive  since  an  edge  which  comes  very  near  a pixel  boundary  can  show 
up  in  the  next  pixel  due  to  noise.  Second,  we  consider  a model  in  which 
the  set  of  edge  pixels  given  is  a subset  of  the  digital  edge  corresponding 
to  the  real  edge.  This  approach  is  more  realistic  since  it  enables  us  to 
discard  some  pixels  whose  classification  as  edge  pixels  in  in  doubt. 
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Finally,  we  give  a model  in  which  some  pixels  lying  on  the  digitization 
of  the  real  edge  are  given  and  some  incorrect  pixels  are  given. 

For  the  first  model,  in  which  a complete  digital  edge  is  available, 
a tight  upper  bound  for  the  registration  error  as  a function  of  the  line 
parameters  is  given.  Probabilistic  error  estimates  are  underway  but  we 
have  not  completed  these  calculations.  For  the  second  model,  in  which 
some  pixels  may  be  missing  from  the  digital  edge,  we  give  a procedure 
which  can,  given  any  subset  of  a digital  line,  produce  a tight  upper 
bound  on  the  registration  error .and  the  expected  error.  As  the  number  of 
subsets  of  digital  lines  is  large,  a complete  description  of  the  error 
as  a function  of  subset  parameters  is  not  readily  available.  We  are 
currently  working  on  analytical  results  to  eliminate  this  problem.  The 
third  model  has  not  yet  been  explored. 

The  three  geometric  models  can  be  extended  to  include  additional 
information  such  as  gradient  values.  For  the  present,  we  decided  that 
the  additional  complexities  added  by  this  information  would  make  analysis 
extremely  difficult.  By  first  developing  the  simpler  geometric  models, 
we  obtain  a standard  for  subpixel  accuracy  which  can  provide  a firm  basis 
for  such  extensions.  The  reliability  of  digital  edges  extracted  from  real 
imagery  is  not  considered  in  this  report,  though  it  is  clearly  important 
in  applying  the  geometric  accuracy  results.  Future  work  using  the  Landsat 
data  base  will  be  directed  toward  establishing  the  reliability  of  edge  pixels. 
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Section  2.1  Digital  Straight  Line  Segment  Parameter  Estimation 
Estimation  of  the  location  parameters  of  a real  world  edge  giving  rise 
to  an  image  edge  is  discussed  in  this  section.  The  ideas  discussed  are 
a summary  of  those  parts  of  [Do-Sm]  which  are  useful  for  subpixel  registra- 
tion. Their  basic  result  is  a determination  of  all  lines  whose  digiti- 
zation is  a specified  chain  code.  In  later  sections,  this  set  of  lines 
will  be  used  to  derive  error  bounds  on  registration  accuracy. 

Several  line  digitization  procedures  are  commonly  used  in  graphics  and 
image  processing.  Given  a line  segment  in  the  upper  right  hand  quadrant 
of  the  plane,  with  slope  and  y-intercept  both  between  0 and  1,  we  define 
its  digitization  as  follows.  To  each  intersection  (a,b)  between  the  line 
and  a line  y = a,  a an  integer,  we  associate  the  pixel  with  lower  left 
hand  corner  (a, Lb]).  (see  figure  2.1).  The  chain  code  (see  Fig.  2.1) 
of  the  sequence  of  pixels  with  lower  left  hand  coordinates  (0,bg),  (l,b]_), 
...,  (N,bN)  is  the  sequence  0 , ...,0jj  where 

0 if  Lb±J  = Lb±  _ 2J 

1 otherwise 

The  restrictions  on  the  slope  and  y-intercept  of  the  lines  under  considera- 
tion are  made  for  simplicity  of  presentation.  By  symmetry  the  results  can 
be  extended  to  remove  these  conditions. 

To  determine  the  lines  with  specified  chain  code,  it  is  useful  to  have 
a parameterization  of  the  set  of  all  chain  codes  of  digital  line  segments 
resulting  from  digitizing  the  class  of  lines  specified  above.  In  [Do-Sm] 
the  following  parameterization  is  given.  A digital  line  segment  chain 
code  (C-^, . . . C^)  is  given  by  a quadruple  of  integers  (N,p,q,s).  N is  the 


Figure  2.1  Chain  code  of  a digital  line.  The 

digitization  of  the  dark  diagonal  line 
has  pixels  with  lower  lefthand  vertices 
(0,0),  (1,0),  (2,0),  (3.1),  (4.1),  (5,1) 
The  resulting  chain  code  indicated  by 
the  arrows  is  00100 


340 


length  of  the  chain  code,  i.e.,  the  number  of  0*s  and  l's.  Next,  q is 
defined  to  be  the  smallest  integer  such  that  there  exists  an  extension 
CN+l,CN+2* ' ’ ' ’ wit^  C1»C2»C3»*‘*  Peri°dic  with  smallest  period  q.  Define 
p to  be  the  numbers  of  ones  in  a period.  The  fourth  parameter,  s,  provides 
a normalization  of  the  chain  code  for  one  period.  Geometrically,  s may 
be  interpreted  as  follows.  Any  chain  code  corresponds  to  a line  segment 
with  rational  slope.  Among  all  such  segments,  select  the  slope  p/q  with 
(p,q)  = 1 which  has  the  minimum  q.  This  q is  the  period.  The  standard 
chain  code  corresponding  to  the  first  period  of  this  chain  code  is  the 
chain  code  of  the  digitization  of  the  first  q pixels  of  the  line  through  the 
origin,  y=  (p/q)x.  The  ith  element  C^,  of  the  chain  code  is  given  by 
Gi  = li|l  - L(i  - 1)|J,  i = 1,2,... a 

The  parameter  s,  of  a code  string  of  length  N,  is  defined  by  the  condition 
that  the  standard  code  string  of  p/q  starts  at  the  (s  + l)th  element  of 
the  original  chain  code.  Given  the  parameters  N,q,p,s  of  a codestring, 
the  ith  element  of  the  original  codestring  can  be  obtained  by 

C±  - LCi  - s)£j  - l(i  - s - 1)|J,  i = 1,2, ... ,N 
The  parameters  satisfy  the  constraints  0 _<  p _<  q N and  0 s _<  q - 1. 

A point  which  will  be  particularly  important  for  the  registration  problem 
is  that  there  are  other  constraints  on  the  parameters  other  than  the  above 
inequalities.  These  additional  constraints,  described  in  Section 2.4  appear 
to  be  rather  complicated.  Our  interest  in  these  matters  stems  from  the 
need  to  enumerate  the  digital.  , lines  satisfying  various  conditions.  If 
not  for  these  messy  constraints,  the  enumeration  problems  would  often  be 
straightforward.  Without  these  additional  constraints  for  fixed  N,  we 
would  obtain  all  digital  line  segments  of  length  N by  independently 
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varying  s,p,q  subject  to  the  constraints  0 <_  p £ q £ N and  0 <_  s <_  q - 1. 

We  now  give  an  example  of  the  computation  of  the  parameters  for  a 
chain  code. 

EXAMPLE.  chain  code  10010100 

N = 8:  there  are  8 digits  in  the  code 

q = 5:  the  above  code  is  part  of  the  infinite  code 
. . . 100101001010010  . . . 

p = 2:  the  number  of  l's  in  the  period  10010  is  2 

s = 1:  the  standard  codestring  of  2/5  is  00101.  The 

standard  codestring  starts  at  the  2nd  element 
of  the  chain  code.  Hence  s = 1. 

The  primary  result  of  [Do-Sm]  is  a description  of  the  set  of  all  lines 
whose  digitization  over  the  x-interval  [0,N]  is  a set  of  pixels  specified 
by  a chain  code.  This  result  is  of  great  importance  for  our  registration 
accuracy  results  since  it  provides  a hold  on  the  errors  which  may  arise  by 

approximating  the  true  edge, by  a feasible  edge.  The  set  of  lines  is 

described  by  a quadrilateral  in  the  (e, a) -plane  where  e is  the  y-intercept 
of  a line  and  a is  the  slope.  The  proof  of  the  following  formula  has  not 

yet  appeared  [ Do  ] so  we  shall  only  present  the  results,  which  is  all  we 

need  for  the  current  work.  Define  functions  F and  L by: 

F(s)  = s - Is/qjq 

and  L(s)  = s + l(N  - s)/qjq 

and  let  t be  defined  by  the  equation  : 

1 + L-^J  - = 1/q  and  0 < Z.  < q. 

The  set  of  feasible  lines  is  a quadrilateral  in  (e,a)-space  with  vertices 
A,  B,  C,  D given  by: 

A = (iF(s)^-J  - F(s)^J,  jjj) 
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B = (LF(s)^J  - F(s)J,  J) 

C = (1  + rF(s  + l)\  1 - F(s  + £)£-, 

D = (1  + rF(s  + l)Z]  - F(s  + £)^,  |p) 

where 

q+  = L(s  + V)  - F(s),  p+  = (pq+  + l)/q 
q-  = L (s)  - F (s  + l) , p-  = (pq_  - l)/q 
The  above  expressions  for  the  vertices  of  the  feasible  quadrilateral 
will  be  discussed  in  greater  detail  in  later  sections.  A generalization 
of  the  above  result  to  subsets  of  a digital  line  will  be  presented, 
though  the  manner  in  which  it  can  be  reduced  to  the  above  formula  is 


unclear. 
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Section  2.2  Feasible  Region  Shape 

The  description  of  the  set  of  all  lines  whose  digitization  is  a 
specified  chain  code  of  a straight  line  segment  will  now  be  used  to  obtain 
a worst-case  bound  on  the  subpixel  accuracy  with  which  we  can  locate  a 
point  in  the  image.  We  will  show  that  given  a period  q chain  code  of 
the  digitization  of  a straight  line  segment,  there  exists  a real  number 
x such  that  the  total  spread  on  y-values  at'  the  point  x of  all  line  seg- 
ments with  the  given  chain  code  is  1/q  (see  Fig.  2.2).  Thus  by  selecting 
the  midpoint  of  this  set  of  (x,y)'s  we  have  estimated  the  position  of  a 
point  on  the  line  to  within  an  error  of  1/ (2q) . This  provides  our  error 
bound.  In  Section  2.5,  we  will  examine  the  distribution  of  l/(2q)  corres- 
ponding to  a probability  distribution  on  lines. 

To  see  the  correctness  of  the  1/q  spread,  we  first  observe  that  lines 
B and  C of  the  feasible  region  (Sec.2.1)  are  parallel  each  with  slope  p/q. 

We  show  that  their  vertical  separation  is  1/q.  These  lines  may  be  thought 
of  as  providing  a channel  where  we  can  find  x values  where  the  spread  is 
1/q.  Next,  the  relationship  between  the  location  of  the  feasible  region 
vertices  in  (e,a)-space  and  the  location  of  paints  on  possible  real  line 
segments  with  the  appropriate  digitization  is  established.  This  will 
yield  a polyhedral  region  in  (x,y)-space  which  is  the  union  of  all  feasible 
lines.  Finally,  we  show  that  there  exists  a real  number  x such  that  the 
extent  of  the  feasible  region  over  x is  determined  only  by  the  lines  B 
and  C,  hence  is  of  width  1/q. 

The  proof  that  B and  C are  1/q  units  apart  vertically  is  now  given. 

In  the  case  of  the  infinite  digital  line,  the  calculation  that  the  spread 
is  1/q  everywhere  is  straightforward.  By  passing  to  the  finite  case,  we 
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introduce  boundary  effects  which  cause  the  spread  to  be  greater  near  the 
ends  of  the  chain  code,  but  the  following  proposition  shows  that  at  least 
one  point  of  the  1/q  width  channel  is  preserved. 

PROP.  2.1  Using  the  notation  of  Sec.  2.1,  Let  B and  C be  the  vertices  of 
the  feasible  region  for  a chain  code  with  parameters  (N,q,p,s) 
corresponding  to  a straight  line  segment.  Then  the  difference 
of  the  y intercepts  of  the  lines  corresponding  to  C and  B is 

l/q. 

PROOF.  Let  W denote  the  difference  in  the  y-intercepts.  Then  W is  given  by 
W = 1 + LF(s  + l)£\  - F(s  + £)£-  - FF(s)q-l  + F(s)^- 
By  definition, 

F(s  + £)  = s + £ — L (s  + £)/qjq 
Since  0 s q - 1 and  0 < £ < q,  we  have  0 < s + £ < 2q 
Thus  l(s  + £)/qJ  = 0 of  s + £ < q 

1 if  s + £ > q 
We  examine  these  two  cases  separately. 

Case  (1) : s + £ < q 

F(s  + £)  = s + £ 

Thus:  W = 1 + l(s  + £)p/q J - (s  + £)p/q  - Fsp/ql  + sp/q 

(As  an  aside,  we  note  that  if  s = 0,  i.e.,  we  normalize  the  posi- 
tion of  the  chain  code,  then  W = l/q  follows  immediately  from  the 
definition  of  £. ) To  simplify  the  expression  for  W,  we  recall 
the  definition  of  £ 

l + L£p/q J - £p/q  = l/q 

£p/q  = 1 + l£p/qj  -l/q 


(s  + £)p/q  = sp/q  + £p/q 


= sp/q  + 1 + [£p/qj  - 1/q 


l(s  + £)p/qj  = 1 + [£p/q]  + [sp/q  - 1/qJ 
L (s  + £)p/qj  - (s  + £)p/q  = [sp/q  - 1/qJ  - sp/q  + 1/q 
Hence  W = 1 + [sp/q  - 1/qJ  - sp/q  + 1/q  - Tsp/ql  + sp/q 
W = 1 + [sp/q  - 1/q J - [sp/q]  + 1/q 
To  complete  our  evaluation  of  W,  we  consider  two  subcases. 

Subcase  (1):  sp/q  is  not  an  integer,  In  this  case,  [sp/q]  = 

[sp/q]  + 1.  Thus  substituting  into  W,  we  have 
W = 1 + [sp/q  - 1/q J + 1/q  - [sp/q]  - 1 
= [sp/q  - 1/qJ  - [sp/q]  + 1/q 

If  sp  < q,  then  (sp  - l)/q  < 1,  sp/q  < 1,  so  we  get  W = 1/q 
The  situation  where  sp/q  is  an  integer  is  considered  in  Subcase 
(2),  so  we  may  assume  sp  > q,  sp/q  is  not  an  integer.  Hence,  there 
exists  an  integer  1 < r < q,  and  an  integer  k > 0 such  that 
sp  = kq  + r 
sp/q  = k + r/q 
Thus  lsp/qj  = k 

sp/q  - 1/q  = k + (r  - l)/q 
Since  r - 1 < q,  we  see  that 
[sp/q  - 1/q  j = k 

Thus  [sp/q  - 1/q J - [sp/q J = 0 

Hence  W = 1/q 

Subcase  (2):  sp/q  is  an  integer 

We  have  [sp/q]  = [sp/ql.  Then 

W = 1 + [sp/q  - 1/q  J + 1/q  - [sp/q] 

Since  sp/q  is  integer,  [sp/q  - 1/q]  = sp/q  - 1 = [sp/q]  - 1 


Intersections  for  the  feasible  region. 

The  four  boundary  lines  A,  B,  C,  and 
D of  a feasible  region  are  shown.  The 
intersection  of  A and  D always  lies 
between  the  parallel  lines  B and  C. 

These  lines  in  the  x,y  space  correspond 
to  the  vertices  A,B,C,D  of  the  feasible 
quadrilateral  in  the  (e,a)  parameter  space. 


Figure  2.3 
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Thus  W = 1/q 

Case  (2) : s + £ >_  q 

Using  [(s  + £)/qJ  = 1,  and  F(s  + £)  = s + £ - q,  we  get 

W=l+l(s+£-  q)p/qj  - (s  + £ - q)p/q  - Tsp/ql  + sp/q 
L(s  + £ - q)p/qj  = l (s  + £)p/qj  - p 
(s  + £ - q)p/q  = (s  + £)p/q  - p 
Thus  W = 1 + L(s  + £)p/q J - (s  + £) p/q  - fsp/ql  + sp/q 

At  this  point,  the  arguments  of  Case  (1)  can  be  applied  and  we  get 
W = 1/q. 

We  have  established  that  lines  B and  C are  separated  by  a vertical 
distance  1/q.  Next  we  show  that,  given  an  x value  and  the  four  lines 
A,  B,  C,  D evaluated  at  x,  the  part  of  the  feasible  region  lying  over  x 
is  the  convex  hull  of  the  four  values . 

PROP.  2.2  Let  L be  a digital  line  of  length  N with  vertices  A,  B,  C,  D 
for  the  corresponding  feasible  region.  Let  A,  B,  C,  D 
correspond  to  the  equation  y = m^x  + b^,  i = 1,..,4.  For  any 

Xq£[0,N],  set  M = maxtm^XQ  + b | i=l 4}  and 

P = minOitjXQ  + b^  | i = 1,...4}.  Then  a point  (xQ,y)  lies  on 
a line  segment  with  digitization  L if  and  only  if  P £ y M. 
PROOF.  Let  Xq  [0,N]  and  let  y = mx  + b be  the  line  corresponding  to  any 
point  in  the  quadrilateral  given  by  A,  B,  C,  D.  Then,  since  the 
quadrilateral  is  the  convex  hull  of  the  set  A,  B,  C,  D,  there  exists 
real  numbers  t^.t^jt^jt^  such  that  the  following  conditions  hold: 

1)  0 _<  t±  _<  1 for  i = 1, ...  ,4 

4 

2)  Z t ,=  1 
i=l  1 
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3)  E (m  x + b . )t.  = mx+b  for  each  i 
i=l  1 11 


Thus  mx~  + b = E (m . xn  + b . ) t . 

0 v i 0 x i 

1=1 

4 

< M E t 
1=1  1 


= M 

Similarly  we  have  mXg  + b P.  Thus  any  feasible  point  (xg,y) 
satisfies  P y _<  M.  Now  let  y^e[P,M].  If  yg  = hkx  + b^  for 
some  i then  y obviously  lies  on  a feasible  line.  If  yg  is  not 
one  of  these  four  values  then  there  exists  i,j  such  that 
ntjXg  + b_^  £ yg  £ mjxg  + bj  . Hence  there  exist  t^,t.  such  that 

t.  + t.  =1  and  0 _<  t£,t.  £ 1.  Setting  the  other  two  t's  to 

1 J 4 

zero  we  have  a quadruple  t^,...,t^  such  that  Yg  = £ (m^Xg  + ^i^i' 

Thus  (xg,Yg)lies  on  the  feasible  line  given  by 

y = ^imi  + tjmj)x+  (t±bi  + tjbj). 

The  next  step  in  finding  a point  Xg  at  which  the  feasible  region  has 
height  1/q  is  to  determine  the  way  in  which  the  lines  A and  D intersect 
the  parallel  lines  B and  C.  We  will  show  there  is  an  interval  [a,b]  c [0,N] 
such  that  lines  A and  D lie  between  lines  B and  C over  the  interval  [a,b]. 

To  do  this  we  establish  the  following  facts  (see  Fig.  2.3 ): 

Let  I(*,*)  denote  the  x-coordinate  of  the  intersection  of  the  two 
arguments , 

1)  The  y-intercept  of  A is  less  than  or  equal  to  the  y-intercept  of  D 

2)  The  y-intercept  of  C is  less  than  or  equal  to  the  y-intercept  of  D 

3)  I(D,C)  < I(A,C) 

I(A,B)  < I(D,B) 


4) 
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5)  I (D, C)  < N,  I(A,B)  < N 

From  the  diagram,  we  can  see  that  selecting  a = max(I(A,B) ,I(D,C))  and 
b = min(I(A,C) ,I(B,D)) , the  feasible  region  has  height  1/q  on  the  interval 
[a,b]. 

LEMMA  2.3  The  y- intercept  of  A is  less  than  or  equal  to  the  y-intercept  of  B. 
PROOF:  Denoting  the  y-intercepts  by  and  we  have 

Yfi  - YA  = fF(s)p/ql  - F(s)p/q  - [F(s)p/q1  + F(s)p+/q+ 

= F(s) (p+/q+  - p/q) 

Since  F(s)  = s >_  0,  we  are  done  if  we  show  p+/q+  - p/q  > 0. 

By  the  definition  of  p+,q+, 

p+Md-  " p/q  = <p<4  + D/GW+)  - p /q 
= p/q  + i/(qq+)  - p/q 
=■ 1/ (qq+) 

It  suffices  to  show  q+  > 0.  By  definition, 
q+  = L(s  + V)  - F (s) 

= s + Z + L(N  - (s  + £)/qJq  - s 
= Z + IN  — (s  + £)/qJq 
Since  Z > 0,  we  have  q+  > 0. 

LEMMA2.4  The  y-intercept  of  D is  greater  than  or  equal  to  the  y-intercept 
of  C. 

PROOF:  Denoting  the  y-intercepts  by  Y^  and  Y^  we  have,  using  the  same  type 

of  arguments  in  the  previous  lemma 
YD  - Yc  = F(s  + £)(p/q  - p_/q_) 

F(s  + l)  = s + l - [ S-  — Jq 

= (s  + Z if  s + Z < q 
(s  + Z - q if  s+£^q 


In  either  case  F(s  + £.)  ^_  0 

p/q  - p_/q_  = 1/ (qq_) 

We  are  done  if  we  can  show  q ^ 0.  If  q_  < 0 then  p/q  < p_/q_. 

This  implies  the  slope  and  y-intercept  of  D are  greater  than  the 
slope  and  y-intercept  of  C.  Hence,  over  the  interval  [0,N],  the 
line  C lies  entirely  below  the  line  D and  entirely  above  the  line 
B.  Thus  there  is  a whole  neighborhood  around  the  point  C in 
(e,a)  space  which  lies  in  the  feasible  region  contradicting  the 
fact  that  C is  on  the  boundary  of  the  feasible  region.  We  con- 
clude that  q_  > 0.  Notice  q = 0 is  precluded  by  the  form  of  the 
slope  for  D.  Since  q_  > 0,  we  see  that  Yp  - Y^,  > 0. 

LEMMA  2.5  I(D,C)£  I(A,C) 

PROOF:  Given  lines  y = m^x  + b^  and  y = n^x  + b2,  their  intersection 

occurs  at  x = (b^  - b2)/(m2  - m-^). 

1 + IF (s  + £)p/qj  - F (s  + £)p/q  - [F(s)p/ql  + F(s)p  ,/q 
I(A,C)  = 

p+/q+  - p/q 

We  consider  two  cases: 

Case  (1) : s + i.  < q 

In  this  case  F(s  + -£)  = s + £.  Recalling  p+/q+  - p/q  ° l/(qq+) 
we  have 

I(A,C)  = qq+(l  + L (s  + £)p/qj  - (s  + £)p/q  - Tsp/ql  + sp+/q+) 
Subcase  (1):  sp/q  not  an  integer 

fsp/ql  = lsp/qj  + 1 

I(A,C)  = qq+(l  + L (s  + £)p/qj  - (s  + £)p/q  + sp+/q+  - Tsp/ql) 

By  the  proof  of  Lemma  2.1  we  have 


I(A,C)  = qq+(tsp/q  - ij  - sp/q  + 1/q  +sp/q  + sp/q+  - Lsp/qJ) 

= qq+(l/q  + s/qq+)  by  the  proof  of  Lemma 

= q+  + s 

Subcase  (2):  sp/q  an  Integer 

Once  again,  using  the  proof  of  Lemma  2.1  we  obtain 

I(A,C)  = qq+(l  + L(s  + £) p/qj  - (s  + £)p/q  - sp/q  + sp/q  +s/(qq+)  ) 
= qq+(l  + L (s  + £) p/qj  - (s  + £)p/q  + s/(qq+)) 

= qq+d  + l£p/qJ  - £p/q  + s/(qq+)) 

= qq+(l/q  + s/(qq+)) 

= q+  + s 

We  now  compute  I(D,C) 

f(s  + £)(p_/q_  - p/q) 

I(D,C)  = 

p_/q_  - p/q 

= F(s  + £) 

= s + £ - l (s  + £)/qJq 
= s + £ since  s + £ < q in  Case  (1) 

I(A,D)  = q+  + s 

= (s  + £)  + l(N-  (s+£))/qJq  + s 
s + £ 

= I(D,C) 

Case  (2) : s + £ > q 

„ (s  + £) 

In  this  case  F(s+£)  = s+£  - l — — ~ — Jq 

I(A,C)  = qq+(l  + l(s  + £)p/qj  - L (s  + £)/qJp  - (s  + £)p/q 
+ l(s  + £)/qJp  - TF(s)p/ql  + F(s)p+/q+) 

After  cancelling  the  terms  f [ (s  +£)/q]p,  we  are  reduced  to  Case 


(1)  and  we  obtain  I(A,C)  = q+  + s.  As  in  Case  (1) 

I(D,C)  = F(s  + £) 

= s + £ - L (s  + £)/q]q 
> s + £ 

From  the  proof  of  Case  (1),  we  had  I(A,D)  s + £ 

Thus  I(A,D)  = I(D,C) 

The  proof  that  I(A,B)  I(D,B)  follows  the  lines  of  the  above  proofs 
and  is  omitted.  For  possible  application  in  later  work  we  give  the  inter- 
sections 

I(A,B)  = s 

I(D,B)  = s + £ + q_  if  s + £ < q 

= s + £ + q_  - q if  s + £ >_  q 

The  intersections  of  A and  D with  B and  C have  been  computed  explicitly 
and  we  can  see  that 

I(B,C)  < N 
and  I(A,B)  < N. 

Thus  by  our  earlier  remarks  we  are  guaranteed  of  the  existence  of  a real 
number  0 _<  x £ N such  that  the  feasible  region  over  x has  height  1/q. 

From  the  results  of  this  section  we  may  conclude  that  given  a digital 
line  with  period  q in  the  sensed  image  such  that  the  underlying  real  edge 
has  slope  between  zero  and  one,  then  we  can  determine  the  vertical  offset 
between  sensed  and  reference  images  to  an  accuracy  of  l/2q  pixels. 
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Section  2.3  Infinite  Digital  Lines 
The  feasible  region  for  infinite  digital  lines  is  easily  computed 
using  the  results  of  Section  2.2.  This  analysis  is  divided  into  two 
parts.  For  any  infinite  digital  line  of  period  q,  we  show  the  channel 
consists  of  two  parallel  lines,  which  are  a vertical  distance  1/q  apart. 
Thus  since  the  channel  extends  over  the  whole  x-axis,  there  is  no  flaring 
at  the  end  as  in  the  finite  case.  If  the  infinite  digital  line  is 
aperiodic,  then  we  show  the  channel  extends  over  the  whole  x-axis,  but 
consists  of  a single  line.  Thus  the  maximum  error  is  l/2q  of  the  digital 
line  if  the  digital  line  . has  period  q and  zero  if  the  digital  line  is 
aperiodic.  The  aperiodic  infinite  digital  lines  are  precisely  those 
infinite  digital  lines  which  are  the  digitizations  of  lines  with  irrational 
slope.  Since  the  irrationals  are  a set  of  measure  one  in  the  unit  interval, 
using  the  uniform  probability  measure,  we  see  that  the  error  is  zero  with 
probability  one  for  infinite  digital  lines. 

Before  considering  the  periodic  and  aperiodic  lines  separately,  we 
note  that  any  two  infinite  lines  with  the  same  digitization  are  parallel. 

Let  y = mx  + b and  y = nx  + c be  two  lines.  Then  the  difference,  h(x), 
in  the  y values  of  these  lines  at  x is  given  by  h(x)  = (m-n)x  + (b-c). 

If  m and  n are  not  equal  then  there  exists  a K>0  such  that  |h(x)|>l  for  all 
x such  that  |x|>K.  Thus  the  two  lines  cannot  have  the  same  digitization. 

We  now  consider  the  case  of  infinite  digital  lines  of  period  q.  By 
the  feasible  region  description  in  Section  2.1,  the  lines  corresponding 
to  the  vertices,  A,B,C,  and  D of  the  feasible  region  in  (e,a)  space  have 


slopes  p_/q_,  p/q,  p+/q+.  Fixing  p,q,  and  s and  letting  N go  to  in- 
finite, we  see  the  above  result  on  the  slopes  of  infinite  lines  having 
the  same  digitization  imply  p_/q  and  p+/q+  must  approach  p/q.  Inserting 
these  limits  into  the  formulas  for  the  vertices  A and  D,  we  see  that, 
in  the  limit  A=B  and  C=D.  We  have  shown  in  Section  2.2  that  B and  C 
are  a vertical  distance  1/q  apart.  This  establishes  the  result  for 
the  infinite  periodic  digital  line. 

The  infinite  aperiodic  line  requries  a different  approach.  We 
first  cite  a version  of  a classical  result  [Wa]  on  lines  with  irrational 
slope.  Let  f(x)  = mx  + b be  a line  with  m irrational.  Then  the  set 
{mx  + b - Imx  + b J : x is  an  integer}  is  dense  in  the  unit  interval. 

It  has  already  been  shown  that  two  lines  with  the  same  digitization  have 
the  same  slopes  and  can  only  vary  in  their  y- intercepts . Let  e>0  be  given. 
Then  the  digitization,  L,  of  the  line  y = mx  + b (m  irrational)  is  aperiod- 
ic so  there  exists  integers  and  such  that  mK^  + b - LmK^  + bj<  t 
and  mK^  + b - Lml^  + bj>  1 - e.  Thus  decreasing  b by  more  than  e would 
change  the  digitization  at  and  increasing  b by  more  than  £ would  change 

the  digitization  at  K^.  Thus  for  any  e»0,  we  cannot  change  b by  more  than 
e without  changing  the  digitization.  Hence  b is  fixed.  Since  m is  also 
fixed,  the  channel  is  the  single  line  y=*mx  + b 
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Section  2.4  Invariant  Line  Measure 
A probabilistic  analysis  of  geometric  accuracy  requires  a probability 
distribution  in  the  fundamental  objects,  the  lines.  It  is  tempting  to 
place  a uniform  distribution  on  the  coefficients  of  the  lines  represent- 
ed in  some  parametric  form.  Unfortunately,  there  is  no  canonical  para- 
metrization  and  the  measure  will  not  be  uniform  with  respect  to  other 
parametrizations.  A customary  escape  from  this  quandry  is  to  impose 
some  parametrization  independent  conditions  which  single  out  a probabi- 
lity measure.  In  geometric  probability  problems,  one  generally  assumes 
the  measure  is  invariant  under  translation  and  rotation  of  the  geometric 
figures,  in  our  case  the  lines.  This  uniquely  determines  a coordinate 
system,  the  (p,0)  polar  coordinates  of  a line,  in  which  the  distribution 
is  uniform  with  respect  to  the  parameters.  To  avoid  the  problem  of 
taking  a uniform  distribution  on  an  unbounded  set,  we  restrict  the 
parameters  to  lie  in  a bounded  set.  The  measure  of  this  set  is  to  be 
normalized  to  one.  The  above  measure  provides  a probability  measure 
on  lines  whose  digitizations  belong  to  any  specified  set  of  digital  lines. 
This  induces  a probability  measure  on  digital  lines  which  can  be  used  to 
perform  a probabilistic  analysis  of  geometric  accuracy. 
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Section  2.5  Digital  Line-Probabilistic  Analysis 

A worst  case  bound  on  registration  accuracy  using  a digital  edge  was 
developed  in  Section  2.2.  More  realistic  error  information  can  be  obtained 
using  probability.  In  this  section  we  consider  the  question  of  obtaining 
probabilistic  information  on  the  registration  error  assuming  the  real  world 
edge  giving  rise  to  the  digital  edge  is  generated  by  a natural  distribu- 
tion on  edges.  We  have  procedures  for  estimating  these  probabilities, 
but  due  to  the  considerable  computational  cost  involved  in  evaluating  these 
in  special  cases,  we  prefer  to  first  seek  analytical  simplifications. 

Many  probabilistic  questions  pertinent  to  the  geometric  accuracy  ques- 
tion can  be  formulated.  Several  of  the  most  basic  are 

1)  Given  a maximum  allowed  registration  error,  what  is  the  probability 
that  the  actual  error  will  not  exceed  this? 

2)  What  is  the  expected  value  and  the  variance  of  the  registration 
error? 

3)  Given  a maximum  allowed  registration  error  and  a maximum  allowed 
probability  of  error  find  the  largest  region  of  lines  (in  some 
sense)  such  that  lines  coming  from  this  region  will  result  in  an 
acceptable  size  error  an  acceptable  percentage  of  the  time? 

We  now  turn  to  an  analysis  of  the  first  question.  We  wish  to  determine, 
for  any  acceptable  error  level  in  the  estimated  offset  between  sensed  and 
reference  image,  what  is  the  probability  that  a random  edge  will  result  in 
a digitization  which  permits  estimation  to  less  than  that  error  level. 

Though  a formula  for  these  probabilities  as  a function  of  digital  line 
length  is  not  available,  a procedure  for  calculating  these  probabilities 
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for  any  given  line  length,  N,  is  described  and  results  for  the  case  N = 10 
are  presented  . In  addition  we  present  asymptotic  upper  bounds  on  the  error. 

The  basic  approach  to  computing  the  error  probabilities  is  quite  simple. 
A probability  density  function  is  given  on  the  set.  A,  of  all  lines  with 
slope  between  0 and  1,  going  through  the  pixel  with  lower  left  vertex 
(0,0).  Since  a line  has  only  one  chain  code,  the  sets  of  lines  with  dif- 
ferent chain  codes  gives  a partition  of  the  set  A.  Hence  the  density  on 
lines  induces  a density  on  chain  codes.  For  a chain  code  with  period  q, 
the  maximum  error  is  l/2q  as  was  shown  in  Section  2.2.  Thus  for  any 
specified  error  h,  we  must  calculate  the  probability  of  the  following  set, 

B,  of  line  chain  codes. 

B = { (N,q,p,s) : l/2q  < h} 

The  set  of  all  linear  chain  codes  of  length  N can  be  enumerated.  For  each 
chain  code  in  B,  the  corresponding  feasible  quadrilateral  can  be  calculated 
as  in  Section  2.1.  The  density  function  on  lines  can  then  be  integrated 
over  the  quadrilateral  and  the  sum  of  these  integrals  over  all  members  in 
B computed.  This  sum  yields  the  desired  probability.  A program  to  perform 
these  computations  is  under  development. 

The  problem  of  enumerating  linear  chain  codes  was  discussed  in  [Ro-We] 
where  an  algorithm  for  generating  the  set  of  linear  chain  codes  was  presented. 
We  have  not  found  any  estimates  in  the  literature  on  the  number  of  chain 
codes  joining  two  points.  Since  we  are  initially  dealing  with  very  short 
lines  fe.g. length  10)  we  have  taken  a naive  but  rapidly  implement able  approach 
to  the  problem  of  line  generation.  First,  generate  a set  of  real  lines 
whose  digitizations  are  guaranteed  to  include  all  digital  lines  of  specified 
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length  N = 10  and  slope  between  0 and  1.  Next  digitize  these  lines  and 
finally  remove  duplicates. 

The  set  of  all  lines  of  the  form  y = (p/q)x  + m/q,  0<p<fq,  (p,q)  = 1, 

m = 0,  , q - 1,  q = 1,  ...,  N together  with  the  line  y = x gives  rise 

to  all  digital  lines  of  length  N with  slope  between  zero  and  one  and  going 
through  the  pixel  with  lower-left  hand  coordinate  (0,0).  This  follows 
from  the  result  proved  in  Section  2.2  that  a digital  line  segment  with 
parameters  (N,p,q,s)  can  be  extended  to  an  infinite  digital  line  which  is 
a digitization  of  a line  of  slope  p/q.  Thus  the  digital  lines  (N,p,q,s) 
will  be  generated,  if  we  generate  all  lines,  y = (p/q)x  + r,  r real, 

0 _<  r < 1.  As  r increases  from  zero  to  one  the  chain  code  of  the  line  can 
change  only  when  the  line  passes  through  a lattice  point.  Let  (v,w)  be 
any  lattice  point  through  which  a line  of  the  form  y = (p/q)x  + r passes. 
Then  the  height  of  the  line  changes  by  an  amount  rp/q  as  x goes  from  zero 
to  r.  Since  the  line  goes  through  (v,w) , the  height  at  x = 0 must  be 
w = rp/q.  Rewriting  this  as  (wq  - vp)/q  and  noting  that  the  height  must 
be  between  zero  and  one  and  that  wq  - vp  is  an  integer  we  see  that  r = m/q 
where  0 £ m < q . 

An  upper  bound  on  the  number  of  chain  codes  of  line  q with  specified 
starting  pixel  and  slope  between  zero  and  one  can  be  obtained  using  the 
fact  that  all  lines  of  the  form  y=*<p/q)x  + m/q,  (p,q)  = 1,  0 £ P £ 

0 £ m £ q - 1 give  rise  to  all  chain  codes.  Using  the  number-theoretic 
function,  0(q),  given  by 

0 (q)  = number  of  integers  £ q and  relatively  prime  to  q, 
we  now  derive  an  upper  bound,  L*(N),  on  the  number  of  digital  lines  as  a 
function  of  the  length  of  the  chain  code.  It  is  easily  seen  that 


N 

L*(N)  = 1 + Z q<})(q), 

q=i 

which  is  obtained  by  counting  the  number  of  lines  y = p/qx  + m/q  described 
above.  Unexpectedly,  this  is  not  the  same  as  the  number  of  distinct 
digital  lines.  This  is  due  to  the  fact  that,  when  q is  sufficiently  close 
to  N,  a line  of  the  form  y = (p/q)x  + m/q  can  give  rise  to  a line  of  period 
less  than  q.  In  fact,  for  each  q > N/2,  there  are  lines  of  the  form 
y = (p/q)x  + m/q  which  give  rise  to  a digital  line  of  period  strictly  less 
than  q.  For  example, consider  the  line 

y = (l/3)x  + 1/3 

This  has  a chain  code  of  length  3 given  by  010  which  has  period  2 while 
y = (l/3)x  and  y = (l/3)x  +2/3  have  chain  codes  001  and  100  respectively, 
each  of  period  three.  More  generally,  for  chain  codes  of  length  N,  of  the 
m possible  chain  codes  arising  from  lines  of  the  form  y = (1/N)x  + m/N, 
only  the  case  m = 0 and  m = N - 1 have  period  N.  To  see  this  we  note  that 
the  chain  codes  of  the  two  cases  are 

m chain  code 

0 00 01 

N-l  100 0 

Any  other  value  of  m shifts  the  one  so  the  chain  code  has  0's  on  both  ends. 
Any  chain  code  with  the  same  digit  at  both  ends  automatically  has  period 
less  than  N.  Using  the  same  principle,  given  any  q _>  N/2  + 1,  there  exists 
lines  of  the  form  (l/q)x  + m/q  which  have  chain  codes  of  period  less  than 
q.  The  total  number  of  lines  of  this  form  which  have  period  q is  N - q + 2. 
The  situation  is  considerably  more  complicated  when  p ^ 1.  We  can  show 
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using  the  above  principle,  that  the  function  L*(N)  is  a lower  bound 
on  the  number  of  lines,  where 

IN/2J+1  N 

L (N)  = 1 + £ q«J»(q)  + £ (N  - q + l)<J>(q) 

, q-1  q=LN/2J+2 

For  N = 10,  the  true  number  of  digital  lines,  L(10)  is  136,  LA(10)  = 102 
and  L*(10)  = 218 

We  have  derived  an  upper  bound  and  a lower  bound  for  the  number  of 
lines.  Using  L*  and  L*,  we  can  develop  asymptotic  upper  and  lower  bounds 
respectively  for  the  expectation  of  the  maximum  registration  error  per 
chain  code. 

We  now  show  that  L*  is  actually  a lower  bound  on  the  number  of  lines. 
PROP.  2.6  L^(N)  is  a lower  bound  for  the  number  of  digital  lines  of  length 

N with  0 p/q  1 

PROOF:  Clearly  the  period  of  a chain  code  is  bigger  or  equal  to  that  of 

a subchain  code.  Consequently,  given  a real  line  which  is  digitized 
over  a segment  of  length  N,  the  period  doesn't  diminish  when  we 
extend  the  interval.  Thus  the  period  of  y = (p/q)x  is  q when 
(p,q)  = 1 and  q <_  N.  Recall  that  all  digital  lines  (N,q,p,s) 

(even  those  of  period  < q)  are  generated  by  digitizing  lines  of  the 
form  y = (p/q)x  + m/q  where  0 <_  p <_q,  (p,q)  = 1,  0 <_  m < q. 

Changing  m produces  a permutation  of  the  chain  code  within  the  first 
q elements,  the  second  q elements,  etc.  The  line  (N,q,p,s)  has 
the  standard  chain  code  starting  at  the  (s  + l)st  place.  Hence, 
as  long  as  s + q <_  N,  one  gets  a full  standard  chain  code  as  a 
subchain  of  the  original  chain  code.  Thus  the  original  code  has 
period  at  least  q.and  thus  exactly  q.  Consequently  one  gets  for 
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a given  N,  q and  for  each  p there  are  at  least  min(q ,N-q+l)  chain 
codes  of  period  q.  This  follows  from  the  observation  that  as  : 
long  as  s + q < N,  a standard  chain  code  is  present  in  the  chain 
code. 

When  2q  - 1 £ N then  all  possible  s satisfy  s + q < N so 
shifting  (N,q,p,0)  by  modifying  s,  all  lines  have  period  q.  For 
2q  - 1 jc  N,  one  obtains  the  count 

2q<N+l 

i + z q<J>(q) 

q=l 

for  these  lines.  This  gives  the  first  two  terms  in  the  definition 
of  L*.  When  2q  > N + 1 we  get  at  least  N - q + 1 lines  of  period 
q.  This  contributes 

N 

Z (N  - q + l)<Kq) 
ql(N/2)+l 

lines  which  is  the  last  term  in  LA. 

We  know  that  LA  is  not  a sharp  lower  bound.  On  the  other  hand,  when  p = 1 
and  2q  _>  N + 1 then  a sharp  lower  bound  is  N - q + 2.  Furthermore,  the 
fact  that  q_  is  positive  provides  a necessary  constraint  on  the  possible 
s for  a given  N,  q,  and  p: 

s + £ < q implies  N - s ^ q 
where  £ is  given  in  Section  2. 1 and  satisfies 

0<£<q  £p=-  l(modq). 

We  don't  know  whether  or  not  this  last  condition  is  sufficient  to  deter- 
mine the  possible  s . 

A form  of  expected  error  will  now  be  defined.  Let  C(N,q)  denote  the 
number  of  digital  lines  of  length  N and  period  q,  and  let  L(N)  denote  the 
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total  number  of  lines  of  length  N given  by 


N 


L(N)  = E C(N,q) . 
q=l 
N 

Define  S(N)  = 1 + E C(N,q)/q.  The  expected  value  of  the  channel  width 
q=2 

(i.e.,  twice  the  maximum  error  in  our  estimation),  E(N),  is  given  by 
E(N)  = S(N)/L(N) 

This  expected  value  is  with  respect  to  a distribution  on  digital  lines  on 
which  all  lines  are  equally  likely  rather  than  using  the  invariant  proba- 
bility measure  on  real  lines.  The  invariant  one  is  difficult  to  handle  in 
evaluating  the  probability  of  the  set  of  feasible  lines  corresponding  to 
a chain  code.  Preliminary  computations  indicate  that  all  digital  lines 
have  similar  probabilities  with  respect  to  the  invariant  measure.  For 
fixed  N,  the  exact  probabilities  using  the  invariant  measure  can  be  done 
exactly.  To  get  a rough  estimate  of  the  probabilities,  the  invariant 
measure  can  be  replaced  by  uniform  measure. 

The  expected  maximum  estimation  error  can  be  computed  asymptotically. 

2 

PROP.  2.7  Up  to  an  error  term  0(logN/N  ),  the  following  holds: 

J-E(N)± 3K 

PROOF:  We  now  compute  asymptotic  formulas  for  L*(N)  and  L^CN).  Recall  that 

if  y is  the  Moebius  function  [H^Wr]  phen 

4> (q)  = q E y(d)/d 
d | q 

From  [HW]  we  also  obtain 


N 


3N 


$00  - E <j>(j)  = — + O(NlogN) 

j = l IT2 


For  any  N, 


N N 

L* (N)  = 1 + Zq<J>(q)  = 1 + E q2  E y(d)/d 
q=l  q=l  d | q 

We  now  write  q = dd'  and  substitute  in  the  last  term: 


L*(N)  = 1 + E d2(d')2u(d)/d 
dd'<N 


N 


= 1 + E d]i(d)  E (d’)2 
d=l  d'<N/d 

The  term  E (d')  = l/3(N/d)3  + 0(N2/d2).  Inserting  this  in 

d'<N/d 


L*(N) , we  obtain 


N 


L* (N)  = 3N3  E y(d)/d2  + 0(N2logN). 
d=l 


Note  we  have  used 

N 2 2 9 N 2 

I E dy(d)N  /d  I < r E 1/d  = 0(N  logN) 

d=l  d=l 

00 

But  Ey(d)/d  = 6/tt2  [HW] . 

1 

N 2 2 

Hence  Ey(d)/d  = 6/tt  + 0(1/N).  Substituting  this  into  L*(N),  we  get 
1 

L*(N)  = 2N3/it2+  0(N2logN). 

We  now  get  an  asymptotic  formula  for  L*(N) 

(N/2)+l  N 

L*(N)  = 1 + E q(p  (q)  + E (N  + 1 - q)cKq) 

1 (N/2)+2 

Using  the  formula  for  L*(N/2),  we  obtain 


N3  2 N N 

L. (N)  = ~ 9 + 0(NZlogN)  + (N  +1)  E <f>(q)  - E q<J>(q) 
4t r (N/2)+2  (N/2)+2 

N N 

But  E <Kq)  - *00  - + 1) 

(N/2)+2 
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3N2 

2 

TT 

2 

9iT 


3N' 

4 7T 


2 + O(NlogN) 


= — y + O(NlogN) 

4tt 


N 


Using  £ q 4>(q)  = L*(N)  - L*(N/2  + 1) 

(N/2)+2  3 2 2 3 2 

= 2NJ/tT  -(2/irl(N/2)J  + 0(N  logN) 

= 7N3/4fr2  + 0(N2logN) 

Finally,  we  get  LA(N)  = 3N2/4tt2  + 0(N2logN) 


We  now  proceed  to  give  an  upper  bound  for  E(N).  Notice  that 


$(N)  = E — (j)(q)  > S(N)  + (L*(N)  - L(N))/N 
1 q 

and  $(N)/L*(N)  = 3/2N  + 0(logN/N2)  - 3/2N 

2 

From  now  on  we  neglect  errors  of  the  form  0(logN/N  ).  From  these 
observations  we  have 


S(N)/L*(N)  + (L* (N)  - L(N))/(NL*(N))  < 3/2N 
(L* (N)  - L(N) ) / (NL*(N) ) = 1/N  - (1/N)L(N)/L*(N) 

On  the  other  hand,  be  definition 
S (N)  = E(N)L(N) 

We  conclude 

(E(N)  - 1/N)L(N)/L*(N)  + 1/N  < 3/2N 
We  now  estimate,  from  below,  the  term  L(N)/L*(N). 

L(N)/L*(N)  > L^(N)/L* (N)  = 3/8 
Thus  (E(N)  - 1/N) 3/8  + 1/N  < 3/(2N) 

(E(N)  - 1/N) 3/8  < 1/ (2N) 

E(N)  < 4/3N  + 1/N 
= 7/3N  ' 
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We  now  have  an  asymptotic  upper  bound 
E(N)  < 7/3N  + 0(logN/N2) 

It  is  clear  that  E(N)  1/N.  Let  S*(N)  = $(N).  Then  we  have: 

NS*(N/2)  > L*(N/2)  (since  in  S*(N/2)  we  divide  by  q,  q <_  N/2)  and 

if  a > b we  have  ^ * * + (taking  derivatives) 

N 

. „ S*(N/2)  + Z 1/q  C(q,N) 

hence  E(N)  = N/2+1 

L*(N/2)  + L(N)  - L*(N/2) 

S* (N/2)  + 1/N(L(N)  - L*(N/2)) 

21  L*(N/2)  + L(N)  - L*(N/2) 

1 NS*(N/2)  + L(N)  - L*(N/2) 

= N L*(N/2)  + L(N)  - L*(N/2) 

1 NS*(N/2)  + L*(N)  - L*(N/2) 

- N L*(N/2)  + L*(N)  - L*(N/2) 

S* (N/2)  + 1/N(L*(N)  - L*(N/2) ) 

= L*(N) 

(3/tt2)  (N2/4)  + 1/N((2N3/tt2)  - (2/tt2)  (1/8)N3) 

~ 2N3/tt2 

= 5/4N 

So  we  get  5/4N  < E(N)  + 0(logN/N2). 

Several  limitations  on  the  utility  of  the  calculations  should  be  empha- 
sized. Of  the  two  limitations  to  be  described,  one  tends  to  make  the  error 
estimate  low  while  the  second  makes  it  high.  The  extent  to  which  these 
factors  may  influence  our  estimates  has  not  yet  been  determined,  however, 
we  are  currently  working  on  extensions  of  our  methods  to  provide  more 
realistic  estimates.  The  strongest  assumption  lowering  accuracy  in  the  use 
of  the  above  methods  is  that  the  edge  pixels  on  the  digital  edge  are  known 
exactly.  A weakening  of  this  assumption  is  discussed  in  Section  2.6.  On 
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the  other  hand,  our  calculations  have  provided  bounds  and  the  expected 
value  for  the  maximum  error  per  digital  line.  We  expect  the  average  error 
to  be  much  less. 

We  now  turn  to  a discussion  of  the  finite  sample  behavior  of  the 
error.  A closed  form  expression  for  the  statistics  of  the  error  as  a 
function  of  code  length  is  difficult  to  derive.  In  order  to  get  some 
feeling  for  the  error  we  computed  the  maximum  error  l/2q  associated  with 
each  digital  line  of  slope  q.  This  error  represents  the  maximum  error 
in  estimating  the  registration  offset,  given  the  digital  line  of  slope 
q.  The  errors  were  calculated  for  all  digital  lines  of  length  ten  with 
the  usual  slope  and  origin  conditions.  There  are  136  digital  lines  of 
length  ten.  Table  2.1  provides  a summary  of  our  results.  The  first 
entry,  ERROR,  in  each  row  is  a registration  error  and  the  second  entry 
represents  the  probability  that  the  maximum  error  is  less  than  ERROR. 

This  number  is  obtained  as  follows.  Given  a value  ERROR,  we  compute 
the  total  number  of  digital  lines  for  which  l/2q  < ERROR.  This  number  is 
then  divided  by  136,  the  total  number  of  digital  lines  to  determine  the 
percentage  of  digital  lines  with  l/2q  < ERROR.  Thus  we  see  from  the 
table  that  the  registration  error  exceed  0.25  pixels  less  than  in  2% 
of  the  digital  lines  of  length  ten.  Similarly,  the  error  exceeds  one 
tenth  of  a pixel  in  less  than  14%  of  the  digital  lines. 

The  information  in  Table  2.1  provides  exact  probabilities  (except 
for  rounding  error)  for  the  digital  lines  of  length  ten.  Given  any 
longer  digital  line,  it  contains  a subsegment  of  length  ten,  so  these 
results  provide  worst  case  bounds  on  the  maximum  error  for  longer 


368 


lines.  It  should  be  noted  that  the  assumption  that  the  digital  lines 
are  equally  probable  is  not  as  plausible  as  the  assumption  that  the 
probability  measure  on  real  lines  is  rotation  and  translation  invariant. 
This  calculation  will  be  performed  in  the  follow-on  work,  but  we  do 
not  expect  the  results  to  differ  greatly.  We  also  note  that  the 
worst  possible  error  l/2q  was  assumed  for  each  digital  line.  The 
expected  error  over  all  real  lines  giving  rise  to  the  digital  line  will 
be  much  smaller. 

We  conclude  that  a very  high  level  of  subpixel  accuracy  is  attain- 
able in  the  restricted  model  discussed  in  this  section.  Furthermore, 
the  calculated  variation  in  error  with  line  slope  provides  a good  criteria 
for  selecting  features  for  registration.  Future  work  will  determine  the 
extent  to  which  this  accuracy  diminishes  as  we  examine  looser  models. 


ERROR 


PROBABILITY  (MAX  ERROR)  > ERROR 


0.5000 

0.0000 

0.2500 

0.0147 

0.1666 

0.0294 

0.1250 

0.0735 

0.1000 

0.1323 

0.0833 

0.2794 

0.0714 

0.3676 

0.0625 

0.6323 

0.0555 

0.7794 

0.5000 

0.9412 

0.000 

1.0000 

Given  an  entry,  a,  in  the  first  column,  the  corresponding  entry 
in  the  second  column  is  the  percentage  of  digital  lines  of 
length  ten  whose  maximum  registration  error  exceeds  a. 

Line  length  = 10 


Table  2.1  Error  Probabilities  for 
digital  lines  without 
points  missing. 
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Section  2.6  Digital  Lines  - Points  Missing 
The  determination  of  the  exact  set  of  pixels  lying  on  the  digi- 
tization of  a real-world  edge  is  not  feasible,  due  to  noise  and  geo- 
metric distortions.  In  this  section,  we  relax  the  condition  that  a 
digital  line  segment  be  available  to  the  weaker  condition  that  only  a 
subset  of  a digital  line  be  detected.  This  situation  is  likely  to 
arise  if  we  try  to  fit  a real  line  to  suspected  edge  pixels  and  select 
those  edge  pixels  for  which  the  difference  in  areas  between  the  two 
parts  of  the  pixel  separated  by  the  line  is  not  great.  These  pixels 
are  more  likely  to  be  correct  edge  pixels.  As  we  are  unlikely  to  be 
able  to  guarantee  the  correctness  of  our  pixels,  this  approach  is  re- 
strictive. We  think,  however , that  this  work  will  provide  a basis  for 
the  analysis  of  the  more  complex  case  in  which  incorrect  pixels  are 
present.  This  section  describes  methods  for  the  analysis  of  the  regis- 
tration accuracy  attainable  by  estimating  the  position  of  a real  line 
using  a subset  of  a digital  line.  Computer  programs  to  estimate  this 
accuracy  are  currently  under  development. 

The  description  of  the  feasible  line  region  for  a subset  of  a 
digital  line  does  not  appear  to  be  easily  described  in  terms  of  para- 
meters characterizing  the  subset.  A simple  observation  leads  to  a 
method  for  calculating  this  feasible  region  in  any  particular  case. 

We  note  that  a line  with  slope  between  zero  and  one  traversing  a pixel 
must  cross  the  main  diagonal  of  the  pixel  (see  figure  2.4).  Given  a 
subset  of  a digital  line,  the  set  of  feasible  lines  is  exactly  the  set 
of  lines  crossing  the  main  diagonal  of  each  pixel  in  the  subset.  Let 
S={S^, . . . ,Sn)  be  a subset  of  a digital  line  of  slope  between  zero  and 


Figure  2.4  Intersection  of  a line  with  the  main 

diagonal  of  a pixel.  This  intersection 
is  used  to  derive  constraints  on  the 
feasible  set. 
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one  and  let  s^  and  s^  be  the  leftmost  and  rightmost  pixels  respectively. 

Let  the  lower  lefthand  vertices  of  s,  and  s.  be  (x^.y^)  and  (x  ,y  ) . 

1 n l’-'l  n’-'n 

Then  it  can  be  shown  that  any  line  whose  digitization  contains  these 
two  pixels  is  a convex  combination  of  the  lines  L( (x^+1 ,y^) , (x^.y^) ) , 

L ( (Xj+l.y^  , (xn,yn+l)  ) , L( (X;L .y^+l) , (*n>yn)  ) , and  L ( .Y-j+1)  , (xfl ,yn+l)  ) • 

Thus  the  feasible  region  is  a quadrilateral  in  y-intercept,  slope  space. 


Each  additional  pixel 


which  our  feasible  lines  are  constrained 


to  pass  through  restricts  us  to  a subset  of  the  feasible  quadrilateral* 
namely  the  subset  consisting  of  all  lines  passing  through  the  main  dia- 
gonal of  the  intermediate  pixel.  Let  L be  a line  passing  through  s^,sn 
and  an  intermediate  pixel  s^.  Assume  L is  in  the  interior  of  the  feasi- 
ble quadrilateral  for  s^  and  sn-  Then  any  sufficiently  small  change  in 
the  slope  and  y-intercept  of  L will  keep  it  in  the  feasible  region.  If 
L does  not  enter  s^  at  a vertex  of  s^,  then  a sufficiently  small  change 
in  its  slope  and  y-intercept  will  not  change  the  fact  that  s^  is  in  its 
digitization.  If  L does  enter  s.^  from  the  left  through  a vertex,  then 
any  increase  on  the  y-intercept  if  it  enters  at  the  top  and  decrease  if 
it  enters  at  the  bottom  will  change  the  digitization  of  L to  exclude  s^. 
Thus  the  boundary  of  the  feasible  region  for  lines  going  through  s^,s^, 
and  sn  is  obtained  from  the  feasible  region  for  s^  and  s^  by  cutting 
the  region  by  those  curves  corresponding  to  all  lines  passing  through 


the  lower  and  upper  lefthand  vertices  of  s^.  These  curves  are  actually 
straight  lines.  Let  (x^,y^)  denote  the  lower  lefthand  corner  of  s^. 

Then  any  line  through  (x^,y^)  satisfies  y^=mx^+b  or  equivalently  m= 

(y_^-b)/x^.  Thus  the  set  of  all  lines  passing  through  (x^,y^)  is  given 
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by  the  set  |(b,y^/x^  - b/x^)|bSR.}.  This  set  is  just  a line  in  y- 
intercept,  slope  space.  This  argument  may  be  extended  inductively 
to  obtain  the  feasible  region  for  any  subset  of  a digital  edge  by 
adding  one  pixel  at  a time.  Each  feasible  region  is  obtained  from 
the  previous  one  by  intersecting  it  with  the  region  contained  between 
the  two  parallel  lines  indicated  above. 

The  computation  of  the  feasible  region  can  be  performed  rapidly 
by  testing  each  vertex  going  sequential  around  the  feasible  polygon  to 
determine  whether  it  lies  between  or  outside  the  next  pair  of  lines. 

This  procedure  tells  us  between  which  pairs  of  vertices  the  parallel 
lines  intersect  the  polygon.  Thus  only  four  intersections  need  be 
computed  for  each  extension. 

Given  a feasible  polygon,  it  is  possible  to  compute  a y-value  at 
which  the  width  of  the  feasible  region  in  x-y  space  is  minimized.  This 
is  analagous  to  the  channel  of  thickness  1/q  discussed  in  section  2.3. 

As  in  the  case  of  the  feasible  region  in  x-y  space  for  a digital  line 
segment,  the  feasible  region  in  x-y  space  for  a subset  of  a digital  line 
segment  is  obtained  by  drawing  the  lines  corresponding  to  the  vertices 
of  the  feasible  polygon.  For  each  x-value  the  feasible  region  extends 
from  the  lowest  point  on  these  lines  to  the  highest  point  over  the 
specified  x value.  The  minimum  width  can  be  shown  to  be  achieved  at  a 
point  where  two  of  the  lines  cross.  Thus  to  compute  the  minimum  width, 
evaluate  the  width  at  each  intersection  of  lines.  For  n lines,  we  have 
n(n-l)/2  intersections  so  fot  moderate  size  subsets,  say  8-10  points 
this  computation  is  quite  fast. 
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We  are  now  able  to  assign  to  each  subset  of  a digital line,  a point 
at  which  its  feasible  region  width  is  minimized.  Using  the  midpoint  of 
this  strip  as  an  estimate  for  a point  on  the  line,  we  can  now  give  an 
upper  bound  on  the  registration  error  for  any  given  subsets. 

The  above  procedure  can  be  used  to  provide  error  bounds  for  any 
given  subset,  but  we  would  like  to  get  some  general  measure  of  the 
success  of  the  method.  One  approach  would  be  to  compute  the  maximum 
error  for  every  subset  of  every  digital  line  through  the  pixel  at  the 
origin  and  with  slope  between  zero  and  one  and  specified  length,  say 
ten  pixels.  If  we  generate  each  digital  line  and  take  all  its  subsets 
we  generate  approximately  136,000  sets,  though  they  need  not  all  be 
distinct  since  lines  can  share  subsets.  By  computing  the  error  for  each 
of  these  subsets,  it  is  possible  to  determine  the  expected  maximum  error 
for  subsets  of  a given  size.  It  would  also  be  possible  to  determine 
those  approximate  slopes  of  digital  lines  which  are  best  in  that  the 
expected  maximum  error  is  minimized. 

We  plan  to  carry  out  the  registration  accuracy  studies  described 
above.  These  results  will  then  be  used  to  assess  the  quality  of  edge 
detection  needed  to  assure  subpixel  registration  accuracy.  We  then 
would  like  to  use  additional  information  such  as  gradients  to  provide 


further  accuracy. 


Section  2.7  Digital  Line  - Points  Missing,  Points  Added 


For  most  images  it  is  impossible  to  guarantee  that  any  set  deemed  to 
be  a subset  of  the  digitization  of  a real  line  is  correct.  In  Section  2.6 
we  discussed  the  accuracy  attainable  when  a subset  of  a digital  line  is 
available.  The  modeling  of  the  further  accuracy  resulting  from  the 
presence  of  incorrect  pixels  appears  to  be  quite  complex.  Our  initial 
plans  for  study  of  this  problem  will  involve  the  addition  of  varying 
numbers  of  incorrect  points  to  a small  number  of  subsets  of  digital  lines 
to  determine  the  resulting  error.  The  planning  of  this  work  is  in  an 
early  stage. 

One  aspect  of  the  incorrect  points  problem  deserves  mention.  The 
knowledge  that  the  digital  edge  comes  from  a straight  edge  provides  a 
powerful  constraint  on  the  feasible  lines.  Given  a set  of  pixels,  it  is 
possible  to  determine  for  each  digital  line,  how  many  pixels  it  has  in 
common  with  the  observed  pixel  set.  If  we  know  the  approximate  beginning 
and  ending  of  the  line  segment,  the  number  of  digital  lines  passing 
through  a substantial  percentage  of  the  observed  pixels  will  be  small. 

The  feasible  region  for  the  digital  line  maximizing  the  number  of  pixels 
hit  is  a reasonable  candidate  for  the  correct  digital  line.  If  more  than 
one  line  maximizes  this  quantity  the  feasible  region  can  be  extended  to 
the  union  of  the  feasible  regions  of  these  digital  lines.  We  intend  to 
pursue  this  approach  in  our  later  work. 
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Section  3.0  Subpixel  Translation-Registration  of  Stationary  Random  Fields 
Consider  the  problem  of  registering  (i.e.,  finding  an  appropriate 
overlay  by  relative  translation  of)  a sensed  planar  image  with  respect  to 
a larger  reference  image  supposed  to  contain  it.  In  typical  remote-sensing 
applications,  both  the  sensed  and  reference  images  will  be  given,  at  the 
same  resolution,  as  arrays  of  gray-level  values,  one  value  for  each  pixel. 
Both  images  will  typically  be  noisy,  due  to  minor  changes  in  weather  or 
ground  features;  to  sensor  characteristics;  to  preprocessing  and  detrend- 
ing; and  possibly  also  to  nonlinear  filtering  of  gray-level  images,  for 
example  by  edge-enhancers  and  thresholding. 

The  primary  model  assumptions  for  our  discussion  of  this  problem  are: 

(a)  there  exist  underlying  continuous  sensed  and  reference  images  Zg(x) 

and  ZR(x)  before  discretization  into  pixels,  where  x = (x1,x2)  are  planar 

coordinates,  such  that  ^(O  and  Zg(«)  are  jointly  strictly  stationary 

random  fields  (i.e.,  have  translation-invariant  statistics)  with  rapidly 

decaying  dependence  between  the  fields  (ZR(x  + y),  Zg(x  + y) ) and 

2 2 1/2 

(ZR(y),  Zg (y _))  as  a function  of  |x|  = (x^  + x (see  [De]  for  precise 

CO 

conditions  and  definitions:  ZR  and  Zg  must  be  <j)-mixing  with  £ r<J)^^  (r)  < 

r=l 

OO); 

(b)  there  exists  an  unknown  translation-parameter  6_  = (Bp^),  a ^nown 
pixel  width  h,  and  a known  kernel-function  K(’,’)  such  that  the  observed 
sensed  and  reference  gray-level  arrays  are 

Xs(j,k)  = K(s,  t)Zg(jh  + 0-^  + s,  kh  + 02  + t)dsdt 

xR(j»k)  = h_2|o|o  K(s»t^ZR(jh  + s>  kh  + Odsdt 


The  fields  Z and  Z are  of  course  assumed  to  be  highly 
R S 


correlated 


“\ 
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images  representing  the  same  ground  truth,  and  for  identif iability  of 

location  it  is  quite  important  that  the  correlation  between  Z (x)  and 

R 

Zs(x  + 2)  be  small  except  for  2 close  to  0.  The  parameter  9^  is  then  iden- 
tifiable in  principle  from  large  images  (ZD(x)).  , , , and 

R“  I *1 1 , I x2 1 <Mh 

(Z  (y  + 9))i  i|  1 . _.  . To  see  whether  and  to  what  extent  9 remains 

s ^ - lYi l * \y2\  1 Lh 

identifiable  from  data  {XR(j,k):  |j|,jk|  _<  M}  and  {Xg(j,k):  |j|,|k|  _<  L} 
is  precisely  our  problem.  Note  that  the  kernel  function  K models  the 
linear  transformation  of  a pixel  image  to  a gray  level.  For  simplicity 
(although  all  our  results  can  be  extended  to  general  known  K) , and  in 
apparent  agreement  with  previous  researchers,  we  assume  in  what  follows 
that  K(s,t)  = 1. 

Our  model  assumptions  are  in  some  respects  similar  to,  but  substantially 
generalize,  those  of  [Mo  - Sm]  (who  were,  however,  interested  also  in  the 
effects  of  affine  distortion).  In  addition  to  (a),  [Mo  - Sm]  assumed  that 
ZR(*)  and  Zg(-  + 0)  are  directly  observable  and  jointly  Gaussian.  This 
restrictive  assumption  is  not  necessary  for  an  understanding  of  the  asympto- 
tic distribution  theory,  for  large  sensed  images,  of  the  maximum-correlation 
estimator  for  e_  (see  below). Moreover , [Mo  - Sm]  do  not  take  into  account  , 
the  transformation  of  ZR,Zg  which  renders  only  XR,Xg  directly  observable 
Thus  their  analysis,  which  we  extend  and  improve  in  Section  3.1  of  this  report, 
is  relevant  only  to  the  problem  of  consistent  estimation  of  9_  in  the  sense 
of  "correct  local  registration".  We  consider  in  Section  3.2  theoretical 
approaches  based  on  model  (a)-(b)  above  to  the  evaluation  of  sub-pixel 
accuracy  of  estimation.  A summary  of  our  findings,  together  with  proposals 
for  further  empirical  and  Monte  Carlo  studies,  concludes  this  section. 
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Section  3.1  Neighborhood-Consistency  of  Maximum-Correlation  Estimation 

The  reason  that  we  do  not  need  to  assume  Gaussian  distributions  for 
gray-levels  is  simply  that  the  fixed-offset  "correlation"  statistic  for 

zr(*)»  V'-+  £)  §iven  by 

(*)  C(t)  = (2T)-2  jjj;  ZR(x  + t)Zg(x  + 0)dx,  T E Lh, 

is  asymptotically  weakly  convergent  as  a random  process  in  t as  L + “ to 
a Gaussian  random  field,  under  the  precise  condition  of  [De]  on  decay  of 
dependence  mentioned  in  (a).  If  ZR(')  and  Zs ( • + 0)  are  directly  observable, 
then  a natural  statistic  to  estimate  _0  is 

0*  = maximizer  of  C(’)  on  [-T,T]  x [-T,T]. 

The  most  easily  interpreted  figures  of  merit  for  this  (and  any  other) 
estimator  are  of  the  form 

Q(t)  = P{  |0*  - 0 | <_  x} 
or 

Qt  (t)  = P{  sup  C(x)  = sup  C(x)} 

0 l2£“il£L  x:.|  1*1  lilTo 

where  j |x| j = max(x^,X2)  and  T^  is  a fixed  size  of  window  inside 
which  may  assume  Julies.  We  note  that  since  [Mo  - Sm]  did  not  treat 
C(-)  as  a random  field,  they  did  not  propose  to  evaluate  quanti- 
ties Qtq(t)  but  rather  to  compare  the  asymptotically  (in  T)  normal  single- 
offset correlations  C(t)  with  either  specified  or  "sidelobe"  thresholds. 

Evaluation  of  Q (x)  is  clearly  a problem  about  random  processes  - not 
0 

simply  finite-dimensional  distributions  - for  which  we  now  formulate  an 
asymptotic  solution,  assuming  (a) . 

Let  D(_t)  denote  the  expectation  EC(jt).  Joint  stationarity  of  ZR(*) 
and  Zg(.  +0^)  implies 

D(t)  = EC (t)  = E{ZR(t)Zs(£)} 

which  would  be  consistently  estimated  when  T is  large  by  the  expression 
C(t)  in  (*) . (In  other  words,  [De]’s  conditions  imply  a law  of  large  numbers 
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for  C(_t)  for  each  _t ) . The  stationary  covariance  function 

V(x  - 2)  ~ Cov(C(x),C(y))  ~ T-2a(x  - y ) as  T <*> 


(which  defines  the  asymptotic  covariance  cr(‘))  can  likewise  be  consistently 
estimated  by  a fourfold  integral  expression  (of.  [Mo  - Sm] , where  some 
simplifications  occur  if  ZR  and  Zg  are  jointly  Gaussian).  The  following 
Lemma  and  corollary  apply  in  particular  to  estimate  the  probability 
Q?q(t)  = P(sup{C*(t)  : | j_t|  ^ £ T0,  | [t_  - _6|  | _>  t})  where  CT(-)  is  the 

Gaussian  random  field  with  the  same  mean  and  covariance  as  C(*)  for  fixed 
T.  For  more  general  conditions  of  applicability,  see  Appendix. 


LEMMA  3.1  Let  Y (t)  be  a real-valued  separable  random  field  on  [-Tq,Tq]' 
and  S be  the  complement  in  [-TqjTq]^  of  a convex  set  such  that 


sui 


•P{  I ll 


t e Sc}  < t - n 


-2 


1 ' - 0 

where  n is  a fixed  integer  _>  3.  Suppose  also  that  for  £,  _t  e S,  for  fixed 

T > 0 and  a non-decreasing  continuous  function  that 

(t)  |Y(t)|/r  and  |Y(s)  - Y(t)|/(  t ( ||  t - s | \±)) 

are  each  stochastically  smaller  than  the  absolute  value  of  a standard 

1 j2 

normal  random  variable  where  J"^(exp (-x2) )dx  < 00 . Then  for  any  x (4dlogn) 

00  2 2 

P(sup  | Y (t)  | > x(r  + — ^ f^(n-u  )du)  } < C (n)  ( e“u  /2du 

tes  ” ” & -1  1 d 'x 


where 

Cd(n)  - V27n(l*^(/2  - ) •Ct2T0n2l)d 

and  T 1 denotes  "roof  function". 

The  proof,  which  we  omit,  is  a direct  adaptation  of  the  method  of 

[Ma] , in  which  proper  attention  was  not  drawn  to  the  very  weak  use  made  of 

the  Gaussian  assumption:  the  assumption  (+)  above  is  of  course  satisfied 

9 1/2 

if  Y(*)is  Gaussian  with  T = sup  [E(Yz(t_))]  and 

teS 
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2 2 

sup  E[Y(s)  - Y(t)]  <_  ip(u).  The  method  of  [Ma]  does  permit 

_s,_teS=  | j s-_t  j | ^<u 

some  further  relaxation  of  (+)  at  the  cost  of  more  complicated  estimates. 


COROLLARY  3.2  Assume  (a),  (*) , and  fix  T > 0.  Let  be  such  that 

_2 

I IjM  I '<  T “ n for  fixed  n>_3.  Define  for  fixed  T 
1 0 

Ht  = inf{D(0)  - D(t):  | | t - 0| | > T,  ||t| \±  < TQ} 

Assume  also  that  (t)  of  the  Lemma  holds  with  Y(t)  = C(_t)  - C(0)  - D(_t)  + D(0), 
d = 2,  S = ft  -i  | |_t|  ] <_  Tq,  | |_t  - _0 1 | x},  and  ijj(u)  ;<  a ub  with  b > 0.  Then 

P(sup{C(t)  ^ | |t| j < Tq,  ||t  - 0||  > t}  > C(0))  < C2(n)j  ■ e "U  /2du, 

-L  X 4* 


, , 2a  / oo 

x*  = Hf/Cy? T 


The  Corollary  follows  immediately  from  the  Lemma  using  Y(‘)  defined 
above  since 

P(sup  (C(t):  Mil  l <T  I ll  - II  I > T>  > C(0))  < P(sup|Y(s)|  > ^) . 

1 u seS 

To  make  the  conclusion  of  the  Corollary  more  specific,  we  note  that  if  C(’) 

were  Gaussian  then  T can  be  taken  (2V(0))b^2  = (2a (0)  )^2/T  while  ij/(u) 

can  be  taken  = 2(V(0)  - V(u));  if  V(*)  can  be  assumed  differentiable  at 

0 (or,  more  conservatively,  to  allow  covariances  such  as  ecp(-]u|),  Holder 


continuous  of  exponent  1/2),  then  b >_  1/4  and  a will  be  of  order  T-^ . 
Choosing  n = |T2J,  and  assuming  the  hypotheses  of  the  Corollary,  we  find 

.2 


(A) 


1 - Qn 


(T)  < 8 T2T4  eXp[-(HT/(r+£:))  /2] 


- 0 


E = 0 (T  1_b) . 


^/(r  + e) 


This  bound,  which  should  be  quite  good  for  sensed  images  of  practical  sizes, 

suggests  as  figure  of  merit  for  local  accuracy  of  registration  the  ratios 
2 

Hx/V(0).  These  ratios  can,  for  instance,  be  estimated  accurately  from  a 
large  reference  image  alone  if  the  noise  field  Zc,(*  + 0)  - Z_(*)  is 
independent  of  Z^(*)  with  known  covariance  structure.  It  remains  as  a 

* 
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subject  for  numerical  experimentation,  with  real  and  simulated  images  of 
various  sizes,  to  test  both  the  validity  and  stringency  of  the  bound  (A). 

Results  related  to  (A) , with  bounds  giving  exponential  decrease 
with  T of  probabilities  of  misregistration,  have  been  obtained  for  a 
somewhat  different  model  in  unpublished  research  of  C.  Herman.  Herman 
considers  a model  in  which  pixel  gray-levels  are  independent  and  regionally 
identically  distributed  for  a finite  (small)  number  of  geometrically 
identifiable  homogeneous  regions.  Thus  his  work,  while  more  special  in 
its  model  of  noise,  does  allow  for  some  nonhomogeneity  over  the  sensed  and 
reference  images.  This  suggests  (and  we  propose  in  Section  4)  that  the 
empirical  testing  of  (A)  should  cover  nonstationary  images  as  well. 

See  Appendix  1 for  some  modifications  of  the  Corollary  in  this  direction. 
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Section  3.2  Interpolation  Using  Pixel-Discretized  Images 
We  turn  now  to  the  question  of  estimating  6_  to  sub-pixel  accuracy 
based  on  data  Xr(0,  Xg(*).  That  is,  given  observations 
(Xr( j ,k) , Xg (j ,k) } Jj|  |kj<L  , and  supposing  it  is  known  that 
jh  £ 0i  < (j*  + l)h,  k*h  £ ©2  < (k*  + l)h,  we  want  to  know  which  charac- 
teristics of  the  random  fields  ZR  and  Zg  control  the  possibility  of  a 
finer  estimation  of  0_.  For  simplicity  and  definiteness,  we  assume  in 
this  Section,  in  addition  to  (a),  (b)  above  with  K = 1 : 

(c)ZN(*)  = Zg ( • + 0)  - ZR(- ) is  independent  of  ZR(-). 

Under  our  assumptions,  the  "correlation-statistics"  at  offset 


x = (ah,gh) 

(**)  C(ah,8h) 


L 

I xs(j,k)x  (j  + a,  k + B) 

k=-L  K 


have  expectation  (assuming  the  means  of  ZR,  Zg  have  been  centered  to  _0) 

D(x)  = D(ah,8h)  = { e^A  • (£  " 2£>)i  ~ 1 [ 2 |elX2h  ~ 1 1 2 G(dX) 

R2  1 Xxh  ' X2h 

where  G(*)  is  defined  by 

Cov(ZR(0)ZR(x))  = ( e^A  • *)c(dA). 

R2 

Now  as  T = Lh  gets  large,  the  covariances  among  all  C(x)  variables  go  to 

0,  and  the  statistical  aspect  of  finding  the  '6*  which  maximizes  C(-)  disappears 
all  that  remains  is  the  interpolation  problem  of  finding  a numerically 
estimated  maximizer  _0  for  D(*)  on  [j*h,j*h  + h]x[k*h,k*h  + h].  In  fact, 
since  D(-)  is  "observable"  (through  C(x))  only  at  lattice-points  (ah,8h) 
where  a and  8 are  integers,  it  is  clear  that  without  some  assumptions  on 
the  functional  form  of  D(’)  or  some  prior  knowledge  about  approximate 
constancy  of  curvature  of  D(*)  on  [j*h,j*h  + h]x[k*h,k*h  + h],  no  precise 
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subpixel  estimation  is  possible. 

In  order  to  derive  an  index  of  how  precisely  one  can  hope  to  estimate 
6_  from  {D(ah,gh):  a, 8 integers}"" — and  it  is  clear  that  for  finite  T, 
the  observability  of  C rather  than  D can  only  degrade  the  accuracy  of 
prediction  — we  expand  the  Fourier-Stieltjes  representation  of  D(x)  in 
a Taylor  series  about  0_.  For  this  we  require  the  following  assumption, 
which  would  follow  from  but  can  be  slightly  weaker  than  mean-square  differ- 
entiability of 


(d) 


5h  = 


I) 


|(elXlh  - l)(elX2h  - 1)|: 
— — 

1 1 


G(dA)  < °° 


Under  assumptions  (a)  - (d) , we  can  write  by  the  Mean  Value  Theorem 

| (e^lh  - l)(eiA2h  _ 1)  |2 


(B)  D (0)  - D(x)  = 


If 


[A  • (x  - 6) ] G(dA) 


2h4A2A^ 


+ 24  ?hl  I*'  "ill4 

where  x'  lies  on  the  line  between  x and  0_,  and  |y|  1.  We  now  suppose  that 

x , and  thus  also  x ' , lies  in  the  pixel  P = [j*h,j*h  + h]x[k*h,k*h  + h] 
containing  0,  and  remark  that  if,  in  addition  to  (d) , Z (•)  is  twice  mean- 
square  differentiable  then  ^ is  uniformly  bounded  in  h. 

From  (B)  it  follows  that  the  maximizer  of  D(x),  known  to  lie  in  the 
pixel  P containing  9_,  is  at  most  a distance  Kh  from  0_,  where 

K = (j2  Val)1/2h 

and  a^  = smallest  characteristic  value  of  the  quadratic  form 
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q(y)  = h-4|||(eiXih  - l)(eiX2h  - 1)|2X-2X^2(X  . 2)2G(dX). 

Moreover,  asymptotically  for  small  h,  K gives  the  approximate  best-possible 

fraction-of-pixel  accuracy  attainable  if  D(*)  is  maximized  via  a local 

quadratic  approximation.  That  is,  knowledge  of  £ and  q(-)  alone  would 

h 

allow  no  better  bound  on  accuracy  of  numerical  maximization  of  D(-). 

Thus  in  the  limit  of  infinite  T = Lh  the  parameter  k is  an  easily 
interpretable  figure  of  merit  for  subpixel  estimation  of  _0  based  on  locally 
biquadratic  surfaces  fit  to  D(*)  (or  equivalently,  to  C(»)).  It  remains 
to  suggest  approximate  computational  procedures,  for  use  with  real  or 
simulated  images  Xg(0,  to  estimate  0_  and  K.  In  fact,  for  very  small 

h (that  is,  rapid  correlation  decay  for  processes  Xjj,  Xg  over  distance  scale 
h,  at  least  if  ^(O  is  twice  mean-square  differentiable)  ^ should  be  ^ 
closely  approximated  by 

E[(vlh  + V2h>2zR<£)32  * E[(VX  + V2)2Xr(0)]2  = £h 
and  similarly  the  quadratic  form  q(2)  is  approximately  by 
q(y)  = E[(y1V1  + y2V2)XR(0)]2 

o 

where  for  any  function  f(x)  on  R , V^f(x)  = f(x)  - f (x^  - h,x2)  and 
V2hf 00  - f(x)  - f(x1,x2  - h) , and  for  function  g(j,k), 

^gObk)  = g(j,k)  - g(j  - l,k) , V2g(j,k)  = g(j,k)  - g(j,k  - 1). 

Now  stationarity  and  ergodicity  of  X^’)  implies  that  consistent  estimators 
(as  L °°)  are  given  by 

ih  E (2L)"2  Z Z + V2)2XR(j, k)]2 

j — L k — L 
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~ 2 ^ ^ 2 

q(y)  = (2L)"2  Z Z [(y.V.  + y.VjX  (j  ,k)  ]Z. 

j=_L  k=-L  z * R 

Letting  denote  the  smallest  characteristic  value  of  the  quadratic  form 
q('),  we  can  define  the  estimated  figure-of-merit  for  subpixel  accuracy 

s - h(i|  yip1'2 

As  with  the  figure-of-merit  defined  in  Section  3.1,  we  must  still  perform 
numerical  experiments  with  real  and  simulated  pixel-discretized  images  to 
test  both  the  correctness  and  informativeness  of  the  subpixel-accuracy 
bound  K. 

If  the  estimated  accuracy  K is  < .5  and  at  the  same  time  the  estimated 

bounds  for  1 - QT(2h)  from  Section  3.1  are  extremely  tight  (say  <.001), 

there  still  remains  the  problem  of  constructing  an  interpolation-estimator 

for  the  maximizer  9_  of  D(*)based  on  the  noisy  values  C(ah,8h).  The  best 

developed  methodology  for  estimating  (interpolated)  values  D(x)  linearly 

from  observations  {C(ah,f3h)}  called  "kriging"  (see  [Du]  and  [Ki]  > 

a,  8 

Section  4.4)  suffers  from  one  glaring  defect  in  this  context,  namely:  it 

requires  that  the  covariances  for  C(*)  be  known  (or  estimated)  at  all 
points  x,  not  simply  at  lattice  points  (ah,Bh).  If  for  experimental  pur- 
poses (as  in  [Mo  — Sm])  we  assume  a special  parametric  form  for  the  covariance 
functions  of  ZR(‘)  and  Zg("),  then  a parameter-estimation  step  followed  by 
kriging-interpolation  and  maximization  (using  the  kriging  equations  given 
by  [Du]  and  [Ri])  will  give  a usable  procedure  for  subpixel  registration. 

This  has  not  yet  been  tried. 
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Section  3.3  Summary  and  Proposed  Numerical  Experiments 
We  collect  in  this  brief  final  Section  our  main  theoretical  results,  and 
the  corresponding  numerical  tests  they  suggest  on  real  and  simulated  image- 
data. 

(i)  For  large  continuous  sensed  images  with  conditionally  Gaussian  noise 
given  the  reference  image  (see  Appendix),  formula  (A)  in  Section  2 bounds 
the  probability  of  misregistration  by  more  than  distance  T.  Numerical  work 
with  pixel-discretized  real  and  simulated  images  is  needed  to  test  the 
validity  and  usefulness  of  the  bound. 

(ii)  When  translation-registration  to  the  nearest  pixel  has  already  been 
accomplished,  and  all  imagery  can  be  assumed  spatially  homogeneous  with 
rapidly  decaying  correlations  on  the  pixel  distance-scale  h,  the  estimator 
K from  Section 3. 2 approximately  limits  the  subpixel  accuracy  possible  if  the 
sensed  and  reference  images  were  infinitely  large  with  noise-  and  reference- 
images  stochastically  independent.  Again,  numerical  experimentation  will 
empirically  determine  whether  these  assumptions  and  figures-of-merit  are 
valid  or  useful. 

(iii)  The  kriging- interpolation  and  maximization  of  C(*)  should  certainly 

be  tried,  as  sketched  at  the  end  of  Section  3.2,  using  simple  parametric  forms 

for  the  covariances  of  Z and  Z . 

R b 

(iv)  Finally,  if  the  experiments  in  (i)  - (iii)  prove  successful,  theoretical 
and  empirical  extensions  of  this  work,  to  the  case  of  registration  with 
respect  to  affine  distortion  considered  by  [Mo  - Sm]  f seem  both  desirable 

and  possible. 
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APPENDIX.  Probability  Bound  on  Local  Registration  Error  for  Conditionally 
Gaussian  Sensed  Images. 

In  this  appendix  we  state  and  prove  a Theorem  giving  the  most  important 
case  (including  that  of  [Mo-Sm]  in  nonstationary  settings)  in  which  the 
hypotheses  of  the  Lemma  and  Corollary  of  Section  3.1  can  be  proved. 

THEOREM  A. 1.  Suppose  that  ZR(*)  and  Zg(«  + __) , with  _ ^ _ Tq  - n-2, 

are  (nonhomogeneous)  real-valued  separable  random  fields  on  RZ  for  which 
ZN(-)  zg(*  + _)  - ZR(»)  is  conditionally  given  ZR(*)  a Gaussian  random 
field,  and  for  which  the  covariance  function  R(s_,t)  of  ZR(-)  is  continuous 
and  satisfies  for  b,c  0 

(c)  sup  {R(s_,£)  + R(_t,t)  - 2R(£,t)  : | |_s  - t_|  | _<  u}  <_  c«u^(-logu)-^^, 

0 < u < 1 

Let:  MN(t)  = E [ZN(t) | {ZR(- ) >] 

PN(s,t)  = E[ZN(s)ZN(t)  - MN(s)MN(t)|{ZR(-)}] 

D(jt)  = E [C (t)  j {ZR(  • ) {]  where  C(*)  is  as  in  (*)  for  fixed  T, 

V(t)  = E[(C(t)  - D(jt)  )2 1 {ZR(*  ) }] 

T2  = sup{v(t):  Utl^  < TQ 

^(u)  = 2 • sup  <2T)-4  / [Z  (x  + s)  - Z (x  + t)l* 

llllli.lltll^o  r~  - R“  " 

I Is.  " l\  lxlu 

[ZR(£  + s)  - zR(y_  + t)  1 PN (x  + + 0)dxdy_ 

Ht  = inf{D(0)  - D(t):  | |t  - 0|  | > I,  | |t|  | < TQ}. 

Then  whenever  x = H / (,-  . — n-bu2du  + T)  > (Slogn)1^2,  where 

T 2 _ i i 

A = suptfCu)/^:  0 < u _<  1} 
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P{sup(C(t):  | |t_I  I-l  ^ T0,  ||t  - 9 | | > t}  > C(0)  |{ZR(*)} 

po  2 

_<  C2(n)  _ e“u  /2du 
x 

PROOF:  First  we  remark  that  the  condition  (c)  implies  [Ad 

Theorem  3.3.3  and  Lemma  3.4.1]  that  ZR(*)  is  uniformly  Holder  continuous 

2 

with  exponent  b on  [-T  - Tg,T  + Tq]  , hence  uniformly  bounded,  and  the 
random  variable  A has  finite  varaince.  The  Lemma  of  Section  3.1  now  applies 
(with  moments  and  probabilities  all  taken  conditionally  given  ) to 

the  conditionally  Guassian  random  field  Y(t)  = C(t)  - C(Q)  - D(t)  + D((3). 

Here  as  in  the  Lemma,  the  conditionally  Gaussian  assumption  could 
be  replaced  by  an  assumption  (+) . 

We  observe  also,  as  in  Section  3.1,  that  I*N  and  are  either  known 
or  can  be  assumed  to  have  a given  form,  then  all  other  quantities  in  the 
Lemma  are  defined  in  terms  of  a given  (realization  of  the)  reference 
image  ZR(*). 


Section  4.0  Maximum  Likelihood  Corner  Detection 


The  reliable  detection  of  edges,  angles,  and  other  geometric  con- 
figurations in  sensed  imagery  is  a key  factor  in  many  algorithms  de- 
signed to  achieve  subpixel  registration  accuracy.  In  particular,  loca- 
ting image  points  to  the  correct  pixel  in  a reference  image,  allows  a 
decoupling  of  the  pixel  and  subpixel  registration  problems.  In  this 
chapter,  we  describe  a maximum  likelihood  estimation  procedure  for 
matching  a sensor  image  comer  with  a reference  image  corner.  This  work 
is  related  to  work  of  Novak  [No]  on  the  estimation  of  curve  matching 
between  a sensed  and  reference  image. 

Novak  proposes  a solution  to  the  problem  of  finding  a particular 
edge  shape  in  a picture  (which  is  usually  called  the  sensor  image) : the 

edge  is  embedded  in  a binary  template  and,  using  an  edge  ratio  statistic, 
the  template  is  matched  to  the  sensor  image. 

An  edge,  Figure  4.2,  is  defined  to  be  a curve  separating  two  homogene- 
ous regions  of  differing  grey  levels.  A template  consists  of  an  edge 
along  with  a narrow  band  of  pixels  on  both  sides  of  the  edge.  It  is  as- 
sumed that  the  pixels  in  the  sensor  image  are  statistically  independent, 
each  being  distributed  exponentially.  The  pixels  which  lie  on  the  dark 
side  of  the  curve  have  mean  6,  and  those  on  the  light  side  have  mean  X. 

The  edge  ratio  statistic  is  z=g^/gQ,  where  g-^  is  the  sum  of  the  grey 
levels  of  the  pixels  under  the  dark  region  of  the  template,  and  gQ  is  the 
sum  under  the  light  region.  The  statistic  is  evaluated  at  all  points  of 
the  sensor  image,  and  that  point  at  which  z is  a maximum  is  selected  as 
the  match  point. 
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The  selection  by  Novak  of  the  edge  ratio  statistic  appears  to  be 
based  on  its  "good"  performance  on  several  test  cases,  and  on  the  fact 
that  its  distribution  can  be  reasonably  approximated  by  the  F distri- 
bution. In  this  section  Novak's  model  will  be  altered  so  as  to  admit 
of  a closed  form  MLE. 
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Section  4.1  The  Model 

In  the  modified  model  the  template  is  assumed  much  larger  than  the 
sensor  image,  and  the  problem  takes  the  following  form.  Imagine  that 
the  template  represents  a binary  photograph  (noiseless)  of  a scene  in 
which  an  edge  separates  a dark  from  a light  region,  as  in  figure  4.2. 

A noisy  photograph  (the  sensor  image)  is  taken  of  a part  of  this  scene  - 
in  general  containing  a segment  of  the  edge.  Thus  given  the  template 
and  the  sensor  image,  we  seek  the  correct  overlay  point. 

Formally  we  view  the  template  T as  the  lattice  of  nm  points 

tll  c12  tlm 

fc21  Hi  •••  t2m 

• • • 

• • • 

• • • 

tnl  cn2  • • • t-nm* 

in  which  each  t^  is  either  0 or  1.  Thus  T is  partitioned  into  the 
sets  Rq  and  R^,  where 

Ra={(h,k):  thk=a,  15h5n,  15k<m},  a=0,l. 

The  sensor  image  S is  the  lattice  of  independent  random  variables 


S11 

S^2  * * * 

sip 

S21 

• 

^22  * ' * 
• 

S2p 

• 

• 

• 

S , 

qi 

• 

• 

Sq2 

• 

• 

sqp 

in  which  the  value  S^.  represents  the  grey  level  of  the  (i,j)th  pixel  in 
the  sensor  image. 


The  distribution  of  each  S^j  depends  on  where  S is  overlaid  on  T. 
If  is  placed  over  t^,  15hSn-q,  15k5m-p,  then  the  conditional  den- 
sity of  (1— i— q , 1-j -p)  is 


(s:h,k)  = f(s,0o)IQ[i,j]  + f (s.e^I-JiJ  ] , 
where  {f(s,0)}  is  a family  of  densities  indexed  by  the  parameter  0,  and 

1,  if  (i+h-1, j+k-1) SRg 


Ia[iJ]  = 


0,  otherwise, 


0=0,1.  The  dependence  of  1^  on  (h,k)  has  been  suppressed  for  notational 


convenience. 
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Section  4.2  Results 

Let  s represent  the  b by  m matrix  [s..],  where  s..  is  the  observed 
- ij  ij 

value  of  Sy  . Then  the  joint  conditional  density  of  S,  given  that 

is  placed  over  t , is 

n,  k 


q p 

fc(£:h,k)  = ft  II  f (s..:h,k) 
b i=l  j=l  13 

= Ur  {f(slj,0())Io [i,jl  + f (s±j ,ei)l1ti,j  ]}. 

In  view  of  the  discussion  at  the  beginning  of  the  section,  we 

will  let  fg(s_:h,k)  be  the  conditional  density  of  the  sensor  image  given 

that  the  correct  overlay  point  is  (h,k),  i.e.,  S is  placed  over  t,,  . 

11  hk 

The  likelihood  function,  which  we  take  to  be  the  logarithm  of  fg(s_:h,k), 
is 


L(s;,h,k)  = Z I ti,j]logf(s  ,0  ) + Z I [i,  j llogf  (s  , 0-  ) . 

Xjju  13  U ij'L  xjx 

Thus  the  maximum  likelihood  estimate  of  the  correct  overlay  point  is  a 

point  (a,b)  for  which 

L(s^a,b)  = max{L(s,h,k) : 15hSn,  ISkSn} . 

Novak  assumes  each  S^_.  is  exponentially  distributed,  and  we  shall  do 

the  same.  Letting  f(s,0)  = exp{-s/0}/0  (0>O,  s>0)  , O^m^,  and  0^=m^,  we 


get 


L(s_,h,k)  = -{logmQ  Z IQ  [i,  j ] + (1/nig)  £ IQ[i,j]s^ 


i*  j 


i.j 


+ logm1  Z 1^  [i, j ] + (1/n^)  Z IjJi.jls  } 

i,j  i.j  3 

= -{nQ  logm0  + gg/mQ  + ^ logn^  + gj/n^}, 
where  n = Z I [i,j],  and  g = Z I [i,j]s..,  a=0,l. 

0t,.0t  Ct  j , CL  IJ 

i.J  i,J  J 

There  are  two  nuisance  parameters  in  the  likelihood  function,  m^  and 
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m^.  To  eliminate  them,  we  replace  them  with  their  MLE's  conditioned  on 
their  correct  overlay  point  being  (h,k).  Setting  partial  derivatives  to 
zero , 

2 

3L/3m  = -n  /m  + g /m  = 0,  or  m = g /n  (a=0,l). 

a a a °a  a a °a  a 

Replacing  by  m^, 

L(s,h,k)  = -{nQ  log(g0/nQ)  + n1  logCg^n.^)  + nQ  + 

Note  that  n^-fn^  = £ [i, j ] + £ I^[i,j]  = qp,  a constant.  Hence 

i,j  i» j 

maximizing  L(£,h,k)  is  equivalent  to  minimizing 

n0  nl 

n0  log(gg/n0)  + nx  log(g1/n]L)  = log{(gQ/n0)  (gj/n^  }, 

which  in  turn  is  equivalent  to  minimizing 

nn  nQ  n,| 

W W = “0  \ • 
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Section,  ,4. 3 Conclusions 

The  statistic  arrived  at  in  this  paper  is  different  from  Novak's 
edge  ratio  statistic.  It  would  be  of  interest  to  compare  their  perfor- 
mance on  Novak's  problem,  even  though  this  would  entail  altering  the 
MLE  statistic.  An  analysis  of  the  asymptotic  behavior  of  the  MLE  sta- 
tistic, as  well  as  a comparison  of  the  MLE  of  the  overlay  point  for  a 
variety  of  distributions,  would  shed  light  on  the  practicality  of  our 
approach  to  the  problem.  The  elimination  of  the  nuisance  parameters  in 
the  maximum  likelihood  estimate  requires  justification.  We  hope  to  soon 
complete  our  analysis  of  the  effect  of  replacing  these  parameters  by 
their  conditioned  MLE's.  Our  results  indicate  that  the  convergence  to 
the  true  parameter  values  is  exponential,  thus  providing  a high  level 


of  confidence  in  the  estimates. 
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Section  4.4  Interpolation  Experiments 

The  maximum  likelihood  estimation  procedure  for  detecting 
corner  location  on  a pixel  level,  suggests  the  possibility  of  extending 
this  analysis  to  give  a maximum  likelihood  subpixel  estimate  for  corner 
or  intersection  detection.  We  intend  to  examine  this  possibility  in  the 
second  phase  of  our  work.  As  a prelude  to  this  work,  we  performed  ex- 
periments to  determine  the  subpixel  accuracy  attainable  using  inter- 
polation of  the  correlation  function  with  the  synthetic  corner  images. 
The  results  of  those  experiments  will  be  compared  with  the  maximum 
likelihood  estimates  obtained  in  future  work. 

The  generated  imagery  consisted  of  a dark  rectangle  (as  in  Sec. 

4.1)  forming  the  upper  right  hand  quadrant  of  the  image.  The  rectangle 
was  shifted  in  the  x and  y directions  by  uniform  random  shifts  of  less 
than  a pixel.  The  rectanble  was  then  rotated  by  0?,  22.5°,  and  45°  to 
give  three  types  of  reference  images.  Grey-levels  in  the  dark  and  light 
regions  were  generated  from  Gaussian  distributions  with  different  means. 
Gaussian  noise  was  then  added  to  the  entire  image.  A 20x20  reference 
image  and  a 15x15  sensor  image  were  used.  The  sensor  image  was  corre- 
lated against  the  reference  image  to  get  correlation  points  in  a 5x5 
neighborhood  of  the  center  pixel.  A biquadratic  polynomial  was  then 
fit  to  this  neighborhood  and  the  peak  of  the  polynomial  was  located. 

For  each  rectangle  angle  0°,  22.5°  , 45°,  one  hundred  offsets  were 
generated  and  the  offset  was  estimated  using  the  above  procedure.  The 
mean  and  variance  of  the  error  were  computed.  The  mean  and  variance 
of  the  error,  assuming  the  center  of  the  pixel  was  the  estimate,  were 
also  computed.  The  results  are  given  in  Table  4.1  Note  that  in  each 
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case  the  interpolation  gave  a larger  mean  error  than  that  obtained  by 
selecting  the  center  of  the  pixel. 

The  results  of  this  limited  experimentation  indicate  that  even  at 
low  noise  levels,  the  interpolation  procedure  provides  low  accuracy 
on  the  model  imagery.  During  the  follow-on  work,  we  wish  to  extend 
the  maximum  likelihood  estimates  to  the  subpixel  case  and  compare  with 
these  experimental  results.  We  then  wish  to  extend  these  results  to 
edge  images  obtained  from  these  synthetic  images.  The  failure  of  inter- 
polation in  the  experiments  should  not  be  viewed  as  a condemnation  of 
the  methods,  for  much  of  the  application  of  these  methods  is  on  edge- 
enhanced  imagery. 
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Section  5.0  A Comparison  of  Correlation,  LSE  and  MiE  for  Image  Hatching 
The  most  common  methods  of  image  matching  are  least  squares  estima- 
tion, maximum  likelihood  estimation,  and  correlation.  Authors  in  the 
field  (e.g.,  [Ho-Ba]  and  [Pr])  often  claim  that  for  their  applications,  two 
or  more  of  these  methods  can  be  assumed  equivalent.  The  exact  conditions 
under  which  these  equivalences  hold  are  seldom  presented.  This  paper  is 
written  to  fill  this  lacuna. 

For  the  purpose  of  conciseness,  all  definitions  contained  in  this 
paper  are  presented  forthwith.  They  are  taken  from  [Ka-Ta]  and 
[Ro-Ka] . 

A function,  call  it  R,  from  the  xy-plane  to  the  real  line  is  a 
discrete  random  field  if  at  each  lattice  point  (i,j)  of  the  plane,  R(i,j) 
is  a random  variable  defined  on  the  probability  space  (ft,F,P).  Thus  at 
each  (i,j),  R(i,j)  is  a function  from  Q to  the  real  line.  This  can  be 
made  explicit  by  denoting  R as  a function  of  three  variables  R(i,j,w), 
where  weft.  At  each  (i,j)  the  expectation  of  R(i,j)  is 

E[R(i,j)]=  / R(i,j,w)  dP fo) 

n 

Since  no  confusion  can  arise  from  deleting  ca,  from  now  on  we  denote  R 
as  a function  of  two  variables  only. 

The  discrete  random  field  R is  homogeneous  (or  wide-sense  stationary)  if 

(i)  E[R(i,j)]  = y < °°,  where  y is  independent  of  (i,j) 

and 

(ii)  for  all  integers  i1,i2, j^, 3^ ,a,  and  B, 

E[R(i1,j1)R(i2,j2)]  = E[R(ix+a,  j^+3)  R(i2+ot,  j2+3)]  < 

It  follows  from  (ii)  that  there  is  a function  r,  which  depends  only  on 
a and  3,  such  that 

r (a, 3)  = E[R(i  + a,  j + 3)  R(i,j)],  (1) 


for  all  (i, j ) . 
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Let  B be  a bounded  region  of  the  xy-plane  and  n the  number  of  lattice 
points  in  B,  and  suppose  B grows  to  eventually  encompass  the  entire  plane. 
The  homogeneous  random  field  R is  called  correlation  ergodic  if,  for  every 
integer  pair  (a,0), 

r(cx,0)  - lim  J Z R(i,j)R(i  4a,  j + 0). 
n-*»  (i,j)£B 

The  convergence  is  in  probability.  This  means  that  the  product  moment 
of  R(i,j)  and  R(i+a,j+0)  (often  called  the  auto-correlation),  can  be 
approximated  by  taking  the  average,  shown  on  the  right  side  of  the  equation, 
over  a sufficiently  large  bounded  region. 

The  restriction  of  a discrete  random  field  T to  a bounded  region  B 
is  called  an  image . Usually  the  region  of  restriction,  B,  need  not  be 
explicitly  indicated,  so  to  increase  the  readability  of  equeations,  the  same 
symbol,  say  R,  will  be  used  to  denote  an  image  and  the  discrete  random  field 
from  which  the  image  is  derived.  Hence  B^  refers  to  the  region  of  restric- 
of  the  image  R.  The  value  that  the  random  variable  R(i, j ) , (i, j )£B^, 
assumes  is  called  the  grey  level  of  the  image  R at  the  point  (i,j).  Before 
continuing,  we  point  out  that  throughout  this  paper,  the  variables  i,j,a, 
and  0 can  assume  integer  values  only.  Lastly,  the  notation  |B^j  refers  to 
the  number  of  lattice  points  in  the  region  B^. 

Suppose  R and  S are  images  with  | Bg | « |b^|  - we  call  R the  reference 
image  and  S the  sensor  image . The  least  squares  estimate  (LSE)  of  the 
match  point  between  S and  R is  any  point  (a,0)  at  which 

L(ot,0)  = Z [S(i,j)  - R(i  4a, j +0)]2  (2) 

(i,j)eBs 

is  a minimum.  The  correlation  estimate  of  the  match  point  is  any  point 
(a,0)  at  which 
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Z S (i , j ) R(i  -Kx,j  +6) 

(i* j )£Bg 

Ypr 

Z S^(i,j)  £ R2(i  +a, j +0) 

(i,j)£Bs  '(i,j)eBs 

In  practice,  it  is  desirable  to  render  the  correlation  estimate  indepen- 
dent of  a uniform  shift  in  the  grey  level  of  either  S or  R.  Hence  correla- 
tion is  applied  to  S'  and  R' , 

S*(i,j)  = S(i, j ) - i Z S(i,j) 

ns  (i,j)eBs 

and, 

R'(i,j)  = R(i,j)  - 1 £ R(i,j), 

R (i,j)£BR 

where  ng  is  the  number  of  pixels  (i.e,,  lattice  points)  in  Bg  and  nR  is  the 
number  in  B . This  transformation  is  presumed  throughout  the  remainder  of 
this  paper. 

Section  5.1  Correlation  and  LSE 

A sufficient  condition  that  (3)  and  (4)  give  rise  to  the  same  match 
point  is  that 

(4)  Z R2 (i  +a,j  +0) 

(i,j)eBs 

be  constant  in  (a,0). 

[Ho-Ba]  and  [Pr]  claim  that  if  (4)  varies  slowly  as  the  sensor 
image  S mover  over  R,  then  (4)  can  be  assumed  essentially  constant 
and  ignored.  The  vagueness  of  the  condition  ’varying  slowly' 
can  be  replaced  with  the  rigor  of  the  following  definition. 

We  say  the  discrete  random  field  R is  almost  constant  if  for  each 
e > 0 there  exists  some  integer  M such  that  for  every  n>M  and  for  every 
bounded  region  B with  n lattice  points 


C(a,0)  = 
(3) 

is  a maximum. 
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p{|r(0,0)-±  Z R2(i,j)  | > £ 1 < £.  (5) 

Note  that  (5)  requires  that  (4)  be  a convergent  (in  P)  sequence  for  all 
(a, 3),  and  that  the  rate  of  convergence  be  uniformly  (in  (a, 3))  bounded 
from  below. 

If  R is  an  uncorrelated  and  identically  distributed  discrete  random 

field  with  finite  fourth  moments  - i.e.  , for  all  (i,j)  E[R^(i,j)]  < 00 

2 

then  R is  homogeneous,  and  by  (1)  E[R  (i,j)]  = r(0,0),  a constant.  By  a 
variant  of  the  Law  of  Large  Numbers  (see  [ Ch ] ) , R is  almost  constant. 

It  is  worthwhile  pointing  out  that  a homogeneous  random  field  R is 
not  necessarily  almost  constant,  and  in  fact  even  if  R is  correlation 
ergodic  it  need  not  be  almost  constant. 

Suppose  R is  an  almost  constant  discrete  random  field  and  S is  any 
image.  Clearly  the  LSE  and  correlation  estimate  of  the  match  point  of  R 
and  S need  not  be  the  same.  However,  LSE  and  correlation  are  equivalent  in 
the  sense  of  the  following  theorem.  A few  definitions  first.  Xc  (a  two- 
dimensional  vector)  is  a point  at  which  the  correlation  formula  (3) 
attains  its  maximum,  and  Xg  is  a point  at  which  the  least  squares  formula 
(2)  attains  its  minimum. 

Theorem  5.1 

If  there  is  an  e > 0 such  that 

p lc(\)  > (£Hrf)1/2  c<*}  } > 1 - i e (6) 

for  all  x f X (note  r = r(0,0)) 
c 

and 

P |l(xs)  < L(x)  - £neJ>l-io  (7) 

for  all  X ¥ X » 
s 

and  if  n = J B | is  sufficiently  large  to  satisfy  (5)  with  £ replaced 
by  e/4,  then 
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P(XC  = xs)  > 1 - e. 

Proof 

Let  X be  a point  at  which 
o 

u(a,B)  = S S(i,j)R(i  +a,  j +B) 
s 

attains  its  maximum,  and  define  the  following  sets: 

?=  |r  -i  E R2(i,j)|  <f  e , 
n (i,j)£Bg  4 

Aa  H (Xe  * Xa),  a = s,c, 

$s  5 <L<XS>  < L(X0)  " \ n e), 

*=  - c<^>  >(lrH)1/2  cO  ’ 

and  r = A n $ |)  f,  a = s,c. 

SL  3L  Si 

On  the  set  I*  , 

LCXq)  < n(r  - * e)  - 2 [)  (xq)  + \ n e 

< n(r  - 1 e)  - 2 U (£,)  + |n  e 

< L(xs)  + \ n e < L(x  ). 

It  follows  from  this  contradiction  that  P(rs)  = 0,  hence  P(AS)  < ~ e. 

On  the  set  T , 
c’ 

- vl/2 

, U(XQ)  U(xc)  / 4r  - e \ 

[n(r  + 1.  e)]1^2  [n(r  - ^ e)  J1^2  \ 4r  + e/ 

/ \ 1/2 

> c<Xo>- 

Thus  P(r  ) = 0,  implying  P(A  ) < — e. 

C c 2 

Combining  the  above  results  we  get: 

P(XS  = Xc>  1 P(XS  = XQ  = Xc) 

= p{(xs  = xD)  n (xG  - xc)> 

> 1 - P(AC)  + 1 - P(AS)  - 1 


> 1 - e. 
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This  theorem  imposes  rather  strong  conditions  on  the  points  X 

c 

and  X > however  with  monor  modifications  to  the  proof,  these  condi- 
s 

tions  can  be  weakened.  It  is  sufficient  that  the  two  probabilistic 

inequalities  (6)  and  (7)  be  true,  respectively,  on  neighborhoods  of 

Xc  and  %s. 

Note  that  no  restrictions  have  been  placed  on  S.  Hence  the  equiva- 
lence of  LSE  and  correlation  hold  in  the  case 

S(i,j)  = R(i  +a,  j +6)  + N(i, j ) , (i,j)£Bs 
where  the  noise  N(i,j)  are  iid,  and  R satisfies  the  conditions  of  the 
theorem. 

We  turn  our  attention  to  a reference  image  R containing  two  homogeneous 
regions  with  means  y and  V.  If  the  sensor  image  S is  offset  by  (a, 3),  some 
of  the  sensor  pixels  will  overlay  region  I of  R and  the  remainder  of  the 
sensor  pixels  will  overlay  region  II.  The  following  shorthand  notations 
will  be  used,  in  which,  from  context,  it  is  understood  the  offset  is  (a, 3): 

Z E I 

{(i,j):  (i  + a,  j +3)  e region  1} 

and 

Z S Z 

1 (i  + a,  j +3)  e region  II} 

In  this  case  the  LSE  is  given  by  the  minimum  of 

L(a,3)  = R2(  i + a,  j +3)  + ZTI  R2(i  + a,  j +3) 

+ Z S2 (i, j ) - 2Z  S(i,j)  R(  i + a,  j +3) 

(i,j)EBs  1 

- 2Zn  S(i,j)R(i  + a,  j +3), 
and  the  correlation  estimator  by  the  maximum  of 
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C(a,  3)  = 


S(i,j)R(i  + a,  j +3)  + En  S(i,j)R(i  + a,  j +3) 
f E S2(i,j)[EIR2(i  + a,  j +3)  + E R2(i  + a,  j +3)] 

(±J)eB«,  11 


h/2 


As  before  we  require  that 

D(a,  3)  = Z R2(i  + a,  j + 3)  + ZI];R2(i  + a,  j + 3) 

be  approximately  constant  with  respect  to  (a, 3)  in  order  to  ensure  that  LSE 
and  correlation  give  rise  to  the  same  match  point.  This  condition  is  satis- 
fied in  the  following  circumstances. 

If  S is  large,  then  shifting  the  offset  by  a few  pixels  will  not 
drastically  alter  D(cc,3),  since  the  set  of  pixels  included  in  D(a,3)  remains 
essentially  unchanged.  But  the  cross-product  term  (the  numerator  in  the 
correlation  function)  will  change,  because  all  product  terms  are  different. 

Thus  in  a neighborhood  of  say,  the  LSE  match  point,  LSE  and  correlation 
will  result  in  the  same  solution. 

Suppose  R is  restricted  to  being  a binary  image,  whereas  S remains  a 
grey  level  image.  If  R consists  of  two  contiguous  regions,  then  matching 
S and  R is  euqivalent  to  finding  an  edge,  of  known  shape  and  size,  in  S. 

This  edge  separates  two  homogeneous  regions  with  different  mean  grey  levels. 

We  can  assign  to  the  pixels  of  R a conditional  estimate  of  the  means 
in  each  region.  If  S is  offset  by  (a, 3),  then  the  conditional  estimate  of  y 
and  V are 

ya3  5 nCO  EI  S(i’j) 

va3  E Mil)  EH  S(l,1) 


and 


Here  n(I)  and  n(II)  are  the  number  of  pixels  of  the  sensor  image  which  over- 
lay, respectively,  regions  I and  II  of  R at  offset  (a ,3).  The  dependency 
of  n(I)  on  (a, 3)  has  been  suppressed,  although  it  is  implicitly  understood. 
Thus,  at  offset  (ct,3),  the  reference  pixels  in  region  I are  assigned  the 
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value  Uag,  and  those  in  region  II  Vag. 

The  correlation  at  offset  (a, 3)  is  given  by 


!"<Dua(!2  + n(ii)va62J1/2 

C(a,3)  = 

[ Z S2(i,j)]1/2 

(i»j)£Bs 


and  the  LSE  is  the  minimum  of 


L(a,B)  = Z S2(i,j)  - n(I)y  2 - n(II)v  2 
(i,j)eBs  a& 


In  this  instance,  then,  correlation  and  LSE  are  equivalent. 
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Section  5.2  LSE  and  MLE 

An  advantage  of  least  squares  estimation  and  correlation  estimation 
is  that  they  are  distribution  independent,  whereas  maximum  likelihood 
estimation  is  highly  distribution  dependent.  In  order  to  use  MLE,  more 
stringent  requirements  must  be  imposed  on  the  underlying  model,  often 
rendering  it  less  realistic. 

Suppose  S and  R are  related  as  follows, 

S(i,j)  = R(i  + a,  j + 8)  + N(i, j ) 

where  (a,B)  is  the  offset  we  seek,  and  the  N(i,j)  are  iid  Gaussian  with 

mean  0 and  variance  O . The  log  likelihood  function  of  S is 

- § log  (27 ra2)  - — 9 Z [S(i,j)  - R(i  + a,  j + B)]2. 

2 2a2(i,j)£Bs 

This  expression  attains  a maximum  when 

I [ S ( i , j ) - R(i  + a,  j + B)]2 

(i, j )eBg 

attains  a minimum.  This  is,  of  course,  the  LSE.  Note  that  R is  not  a 
random  field,  and  because  at  the  correct  offset  (ag»Bg) 

E[S (i, j ) ] = R(i  + a,  j + B), 

S is,  in  general,  not  a homogeneous  random  field. 

If  R is  a binary  image,  as  described  at  the  end  of  Section  5.1, 
then  MLE  is  equivalent  to  LSE  which  is  equivalent  to  correlation. 
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Section  6.0  Conclusions  and  Future  Work 

We  have  developed  statistical  and  geometric  models  for  subpixel 
accuracy.  Using  a restrictive  geometric  model,  we  were  able  to  derive 
bounds  on  subpixel  accuracy.  These  bounds  are  useful  for  both  error 
prediction  and  for  selection  of  features  for  registration.  Under  the 
assumptions  of  our  model,  a high  level  of  subpixel  accuracy  is  possible. 

We  are  currently  extending  these  results  to  more  realistic  models. 

Several  bounds  on  registration  accuracy  were  derived  under  the 
assumption  of  statistical  models  for  the  images  and  noise.  Two  cases 
were  considered.  First,  the  reference  and  sensed  images  were  assumed  to 
be  continuous  and  bounds  on  the  offset  error  were  derived.  In  the  second 
model,  it  was  assumed  that  the  image  was  digitized  and  that  a registration  to 
the  correct  pixel  is  available.  In  addition,  a consistent  maximum  like- 
lihood estimator  was  developed  for  corner  detection  under  a stochastic 
model  for  such  features.  Finally,  conditions  were  established  under 
which  maximum  likelihood,  correlation  and  least  squares  methods  for  image 
matching  are  equivalent. 

The  extension  and  testing  of  our  geometric  modeling  methods  will  be 
a key  part  of  our  continuing  work.  The  level  of  subpixel  accuracy 
attainable  under  our  restricted  model  was  sufficiently  high  to  warrant 
detailed  investigation  of  less  restrictive  models.  For  the  case  of  the 
digitization  of  a real  line,  we  will  complete  the  probabilistic  analy- 
sis using  the  invariant  measure  on  lines  for  several  lengths  of  digital 
lines.  This  will  give  more  realistic  information  on  subpixel  accuracy. 

The  subpixel  accuracy  attainable  will  be  shown  to  be  even  better  than 
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our  present  results  indicate  since  we  have  chosen  a worst  case  bound. 

We  will  also  examine  the  case  of  a digital  angle  formed  by  two  digital 
lines  intersecting  at  a specified  angle.  Once  again,  this  situation, 
which  models  road  intersections,  can  only  improve  the  subpixel  accuracy. 

The  case  of  digital  lines  with  points  missing  will  first  be  investi- 
gated experimentally.  Using  the  methods  outlined  in  Section  2.6,  we  can 
compute  bounds  on  the  offset  estimation  error.  The  bounds  derived  in 
this  manner  will  be  adequate  to  describe  this  more  general  model,  but  we 
will  attempt  to  model  this  situation  to  aid  us  in  the  still  more  general 
models.  Our  most  general  model  in  which  points  are  missing  and  extran- 
eous points  are  added  to  the  digital  line  will  be  investigated  next.  The 
exact  form  of  this  study  will  depend  upon  the  previous  results. 

We  will  experiment  with  LANDSAT  and  simulated  data  to  estimate  the 
accuracy  to  which  we  can  detect  the  pixels  on  a digital  line.  Using  these 
observations  we  will  develop  each  accuracy  model  to  be  used  in  evaluating 
the  set  of  all  digital  lines  to  determine  procedures  for  selection  of 
good  registration  features,  e.g.,  which  line  slopes  are  best. 

Experimentation  will  be  necessary  to  determine  the  usefulness  of 
the  statistical  bounds  developed  in  Section  3.  We  review  briefly  the 
proposed  work  in  this  direction. 

(i)  For  large  continuous  sensed  images  with  conditionally  Gaussian  noise 

given  the  reference  image  (see  Appendix) , formula  (A)  in  Section  3 bounds 
the  probability  of  misregistration  by  more  than  distance  T.  Numerical  work 
with  pixel-discretized  real  and  simulated  images  is  needed  to  test  the 
validity  and  usefulness  of  the  bound. 


(ii)  When  translation-registration  to  the  nearest  pixel  has  already  been 
accomplished,  and  all  imagery  can  be  assumed  spatially  homogeneous  with 
rapidly  decaying  correlations  on  the  pixel  distance-scale  h,  the  estima- 
tor ic  from  Section  3 approximately  limits  the  subpixel  accuracy  possible 
if  the  sensed  and  reference  images  were  infinitely  large  with  noise-  and 
reference-images  stochastically  independent.  Again,  numerical  experi- 
mentation will  empirically  determine  whether  these  assumptions  and  fig 
ures-of-meritare  valid  or  useful. 

(xii)  The  kriging-interpolation  and  maximization  of  C(*)  should  certainly 

be  tried,  as  sketched  at  the  end  of  Section  3,  using  simple  parametric 

forms  for  the  covariances  of  Z and  Z . 

K o 

(iv)  Finally,  if  the  experiments  in  (i)  - (iii)  prove  successful, 
theoretical  and  empirical  extensions  of  this  work,  to  the  case  of  regis- 
tration with  respect  to  affine  distortion  conditions  of  [Mo-Sm] , seem 
both  desirable  and  possible. 

The  corner  detector,  used  to  locate  a highly  reliable  match  point 
for  registration  will  be  studied.  This  study  will  consist  of  analytical 
modeling  for  subpixel  accuracy  as  well  as  experimental  studies. 
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ABSTRACT 

Complete  sensor/platform  modelling  is  derived  and  used  for  the 
generation  of  synthetic  data  and  for  rectification  studies  of  satellite 
scanner  data.  All  satellite  position  and  sensor  attitude  parameters  are 
recovered.  Rectification  accuracy  improves  marginally  when  using  more 
than  25  control  points,  and  is  highly  sensitive  to  errors  in  image 
point  identification. 
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1.  INTRODUCTION 
1.1  General 

Remote  sensing  imagery  produced  by  various  sensors,  such  as  frame 
cameras,  scanners,  etc.,  may  be  considered  as  a transformation  of  the 
object  space,  e.g.  ground  surface,  into  the  image  space  which  may  be 
a plane,  a cylindrical  surface,  etc.  Scanner  imagery,  with  which  this 
paper  is  concerned,  is  the  result  of  transforming  the  three-dimensional 
ground  surface  into  equivalent  cylindrical  surface,  which  when  developed 
becomes  a two-dimensional  image  space. 

Rectification  is  essentially  the  process  of  defining  the  inverse 
transformation  which  will  allow  us  to  recover  the  ground  surface  from 
corresponding  imagery.  We  can  fully  recover  the  ground  surface  from 
imageries  only  if  we  have  multiple  coverage  of  the  same  ground  area  from 
different  acquisition  locations.  Since  the  inverse  transformation  is 
from  a two-dimensional  surface  (the  imagery)  into  a three-dimensional 
surface  (the  ground),  rectification  using  single  coverage  imagery 
requires  that  one  of  the  three-dimensions  of  the  ground  space,  usually 
the  elevation,  be  known  or  assumed  known  a-priori. 

Another  process,  which  is  very  similar  to  rectification,  is 
registration.  In  rectification,  we  determine  the  ground  position  of 
points  in  a given  imagery,  while  in  registration,  we  locate  these  points 
on  other  imageries  covering  the  same  area.  The  effectiveness  of 
registration  depends  on  how  close  to  each  other  are  the  acquisition 
points  of  the  different  imageries.  Because  rectification  and  registra- 
tion are  very  similar,  methods  suited  for  one  can  be  applied  to  the 
other,  with  slight  modifications. 
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As  is  well  known,  an  imagery  consists  of  picture  elements  called 
pixels.  If  the  position  of  the  exposure  station,  i.e.  the  platform 
(satellite)  position,  and  the  direction  of  the  vector  from  the  exposure 
station  to  the  pixel  are  known,  its  ground  position  can  be  derived.  In 
general,  every  pixel  is  imaged  at  a different  time,  hence  a given  pixel 
has  a unique  exposure  station  and  a unique  vector  direction.  If  the 
satellite  position  corresponding  to  all  pixels  and  if  all  pixel  direc- 
tions are  known  to  the  required  accuracy,  the  problem  of  rectifying  an 
image  is  solved.  Unfortunately,  either  because  of  cost,  because  it  is 
not  technically  possible,  or  both,  the  position  of  the  satellite,  or 
the  ephemeris,  and  the  direction  of  pixel  vectors  are  not  available 
with  the  required  accuracy. 

An  alternative  procedure  for  rectifying  imagery,  is  through  the 
use  of  ground  control  points.  These  are  points  the  positions  of  which 
are  known  both  in  the  imagery  and  on  the  ground.  A mathematical  model 
exists  which  relates  the  position  of  a point  on  the  imagery,  the 
corresponding  satellite  position,  pixel  vector  direction  and  ground 
position.  Suppose  there  are  points  with  known  positions  both  in  the 
imagery  and  on  the  ground  (control  points);  then  presumably,  using 
the  mathematical  model,  we  can  solve  for  the  satellite  position  and  the 
pixel  vector  direction.  This  is  only  possible  if  the  satellite  position 
and  pixel  vector  directions  are  expressed  in  parametric  form  since  each 
pixel  has  a unique  direction  and  a unique  corresponding  satellite 
position.  A pixel  vector  direction  can  be  broken  down  into  two  com- 
ponents, namely,  the  attitude  or  orientation  of  the  sensor  coordinate 
system  and  the  direction  of  the  pixel  with  respect  to  the  sensor 
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coordinate  system  which  is  usually  observed.  Then  only  the  sensor 
attitude  need  be  modelled  in  parametric  form. 

This  approach  to  rectification  has  three  main  elements:  (1)  the 

type  of  mathematical  model  used,  (2)  the  method  of  adjustment  used, 
and  (3)  the  manner  in  which  a-priori  information  is  exploited. 

1.2  Mathematical  Models  Used  for  Rectification 
The  two  main  types  of  models  are  the  implicit  and  explicit  models. 
The  implicit  model  relates  the  point  on  the  imagery  to  the  correspond- 
ing point  on  the  ground  using  parameters  that  have  no  direct  physical 
significance,  i.e.,  satellite  position  and  sensor  attitude  cannot  be 
derived  from  these  parameters.  These  types  of  models  are  more  commonly 
known  as  interpolative  or  surface  fitting  models.  The  explicit  model, 
on  the  other  hand,  relates  the  point  on  the  imagery  to  the  point  on  the 
ground  using  parameters  that  have  real  physical  meaning.  These  para- 
meters include  either  the  satellite  position  and  sensor  attitude  them- 
selves, or  other  parameters  which  are  related  to  them.  The  group  of 
explicit  models  are  commonly  known  as  parametric  models.  Each  of  the 
two  types  of  models  is  discussed  separately. 

1.2.1  Interpolative  or  Surface  Fitting  Models 
The  most  commonly  used  model  of  the  interpolative  type  is  the 
polynomial  function.  This  includes  similarity,  affine  and  higher  order 
polynomials.  Normally,  the  ground  is  first  projected  into  a mapping 
plane.  If  necessary,  the  image  is  also  projected  into  an  equivalent 
plane.  The  general  form  of  the  polynomial  function  is  as  follows: 
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2 2 2 2 

(1.1a)  X = ag  + a-|X  + a2y  + a 3xy  + a4x  + agy  + a gx  y + a7xy  + .... 

(1.1b)  Y = bg  + b^x  + b2y  + bgxy  + b^x2  + bgy2  + bgx2y  + b7xy2  + .... 

where  X,Y  are  the  map  coordinates,  x,y  are  the  image  coordinates  (or 
pixel  locations)  and  aQ,  bQ,  a-j,  b-j,  a2>  b2>  ....  are  the  mapping  para- 
meters. Polynomials  are  global  in  the  sense  that  only  one  set  of  para- 
meters is  used  for  the  whole  image  frame. 

If  the  density  of  the  control  points  is  high,  global  functions 
might  not  be  appropriate.  Then  the  frame  might  be  divided  into  segments 
and  a different  polynomial  function  applied  to  each  segment.  If 
conditions  of  continuity  are  inforced  at  the  boundary  of  the  different 
segments,  the  approach  becomes  known  as  the  method  of  splines. 

A totally  different  approach  applicable  also  if  the  control  point 
density  is  high,  is  the  method  of  moving  averages.  In  this  method  a 
different  polynomial  is  used  for  every  point  to  be  interpolated.  Each 
polynomial  is  centered  on  the  point  of  interest.  The  degree  of  each 
polynomial  might  be  low  and  the  effective  area  might  be  small  but  still 
this  method  is  computationally  expensive. 

After  rectifying  an  image,  the  residuals  or  differences  between 
computed  and  observed  coordinates  of  control  points,  can  be  calculated. 
Again  if  the  density  of  the  control  points  is  high,  it  may  be  desirable 
to  perform  additional  processing  to  reduce  the  magnitude  of  the  resi- 
duals. The  method  of  linear  least  squares  prediction  is  best  suited 
for  this  type  of  processing.  The  method  assumes  that  the  residuals 
belong  to  a random  field. 
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1.2.2  Parametric  Models 

Parametric  models  follow  closely  the  geometric  and  physical  proces- 
ses which  produced  the  imagery.  Because  of  this,  parametric  modelling 
can  be  logically  divided  into  sensor  modelling  and  platform  modelling. 
Parametric  models  also  depend  on  the  assumed  figure  of  the  earth  surface. 

Sensor  models  reflect  the  type  of  sensor  used.  They  are  independ- 
ent of  the  platform  (satellite)  used  and  the  type  of  surface  being 
imaged  (e.g.  earth).  The  results  of  sensor  modelling  are  either  cor- 
rected sensor  vector  directions  corresponding  to  each  pixel,  or  pixel 
positions  projected  on  a plane.  For  scanner  type  sensor,  projection 
of  pixel  positions  on  a plane  corrects  for  the  panoramic  effect.  Other 
corrections  applicable  are  due  to  non-linearity  of  scanning,  unequal 
number  of  pixels  per  scan,  and  the  effect  of  scan  line  corrector  (for 
Thematic  Mapper  Only).  Sensor  modelling  is  sometimes  known  as  internal 
modelling. 

The  platform  model  describes  the  behavior  of  the  satellite  which 
is  the  platform  for  imaging.  Platform  modelling  primarily  consists  of 
two  parts:  sensor  attitude  modelling  and  satellite  position  and  orbit 

modelling.  Attitude  models  can  be  polynomials,  harmonic  series  or  auto- 
regressive models.  The  independent  parameter  for  attitude  models  is 
usually  time.  Satellite  position  and  orbit  models  can  be  grouped  into 
three  general  types.  The  first  group  defines  the  satellite  position  in 
terms  of  the  satellite  position  vector,  and  the  satellite  orbit  is 
defined  in  terms  of  both  the  satellite  position  and  velocity  vectors. 

Both  vectors  vary  with  time.  The  second  group  defines  the  satellite 
orbit  in  terms  of  the  five  orbital  parameters  as  defined  in  orbit 
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mechanics.  In  this  case,  these  orbital  parameters  vary  with  time.  The 
satellite  position  is  defined  in  the  orbital  plane  as  a function  of 
time.  The  third  group  is  similar  to  the  second  in  the  sense  that  the 
satellite  orbit  is  also  defined  in  terms  of  orbital  parameters  and 
that  the  satellite  position  in  the  orbital  plane  is  also  defined  as  a 
function  of  time.  The  main  difference  is  that  in  the  last  group,  the 
orbital  parameters  are  independent  of  time,  i.e.,  they  are  constant  for 
a given  frame.  As  a consequence,  the  shape  of  the  orbit  has  to  be 
defined.  The  shape  of  the  orbit  can  be  assumed  to  be  a straight  line, 
a circle  or  an  ellipse.  As  a further  consequence  for  assuming  the 
orbital  parameters  constant,  the  deviation  of  the  actual  satellite 
position  from  its  computed  position  using  the  orbital  parameters 
has  to  be  modelled.  Satellite  position  deviation  models  can  be  poly- 
nomials, harmonic  series  or  auto-regression  models  similar  to  the 
attitude  models.  Again  these  models  are  functions  of  time. 

The  last  element  in  parametric  modelling  pertains  to  the  assumed 
shape  of  the  earth.  The  shape  of  the  earth  is  important  because  no 
computation  can  be  done  on  its  surface  unless  its  shape  is  known.  For 
purposes  of  rectification,  the  surface  of  the  earth  can  be  a map  projec- 
tion plane,  a sphere  or  an  ellipsoid. 

1.2.3  Other  Model  Considerations 

Given  a selected  model  with  redundant  data,  an  adjustment  method 
is  applied.  There  are  two  types  of  adjustment  currently  in  use:  the 

least  squares  method  and  Kalman  filter  approach.  The  former  is  a batch 
type  of  adjustment.  All  observations  are  adjusted  in  one  pass  and  the 
parameter  estimates  are  then  computed.  Inherent  in  this  method  is  the 
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assumption  that  the  model  is  fixed.  The  second  approach  is  inherently 
sequential  in  nature.  Observations  are  incorporated  into  the  adjust- 
ment in  small  groups.  The  precision  of  the  parameter  estimates  increas- 
es up  to  a certain  limit  as  the  number  of  observations  incorporated  into 
the  adjustment  increases.  The  model  used  in  this  adjustment  is  consid- 
ered random,  hence  it  gets  adjusted  together  with  the  observations. 

During  rectification  adjustment  using  ground  control  points,  the 
sensor  attitude  and  satellite  position  parameters  are  unknown.  In 
reality,  some  or  all  of  these  parameters  may  be  measured  but  to  a 
precision  which  is  inadequate  for  rectification.  These  measurements, 
and  others  that  are  related  to  them,  constitute  a-priori  information. 
Instead  of  using  these  measurements  as  initial  approximations  for  the 
corresponding  parameters,  they  are  used  as  a-priori  estimates  with 
proper  a-priori  covariance  matrices.  In  this  manner,  they  are  allowed 
to  vary  in  the  adjustment.  The  amount  of  variation  is  commensurate 
with  the  a-priori  variances  and  covariances. 

1.3  Review  of  Literature 

The  earliest  and  easiest  approach  to  rectification  of  satellite 
scanner  data,  is  the  use  of  polynomial  models.  Many  authors  have 
reported  that  the  resulting  accuracy  is  comparable  to  other  methods 
(Forrest  [10],  Trinder  [20],  Bahr  [3],  Dowman  [7]).  Because  of  its 
reported  accuracy  and  ease  of  use,  polynomials  are  still  presently 
the  most  commonly  used  rectification  method. 

The  earliest  parametric  model  applied  to  satellite  scanner  data 
assumes  that  the  orbit  is  a straight  line  and  that  the  earth  surface 
is  projected  onto  a mapping  plane  (Kratky  [12],  Konecny  [11],  Dowman 
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[73).  In  effect,  the  treatment  of  satellite  scanner  data  is  the  same 
as  that  of  aircraft  scanner  data.  Parameters  describing  the  variations 
in  attitude  and  elevations  were  recovered. 

The  next  improvement  in  parametric  modelling  is  due  to  Caron  and 
Simon.  They  defined  the  satellite  orbit  and  position  in  terms  of 
satellite  position  and  velocity  vectors  (Caron  and  Simon  [63,  Puccinelli 
[163).  They  also  did  away  with  the  use  of  map  projection  during  the 
adjustment  process.  They  assumed  instead  that  the  earth  is  a sphere 
and  performed  computations  on  its  surface  (Caron  and  Simon  [63,  Bahr 
[43,  Sawada  [183).  The  parameters  recovered  daring  the  adjustment 
were  the  same  as  those  in  the  previous  method.  They  are  further 
credited  with  the  use  of  Kalman  filter  to  solve  for  the  parameters  in 
the  adjustment. 

Bahr  was  the  first  to  define  the  satellite  position  in  terms  of 
orbital  parameters  that  are  functions  of  time  (Bahr  [43).  He  recom- 
mended that  only  parameters  describing  the  attitude  and  elevation 
variations  should  be  recovered. 

Next  the  orbit  was  defined  in  terms  of  constant  orbital  parameters. 
This  assumption  requires  that  the  shape  of  the  orbit  be  defined  and  that 
the  deviation  of  the  actual  satellite  from  its  predicted  position  be 
modelled  in. terms  of  time.  The  shape  of  the  orbit  had  been  modelled  as 
a circle  (Forrest  [93,  Levine  [133,  Synder  [193)  and  as  an  ellipse 
(Bahr  [43,  Sawada  [183).  Only  Levine  so  far  has  incorporated  in  his 
model  all  three  components  of  satellite  position  deviation  (Levine  [133). 
Like  the  others,  however,  he  also  recommended  that  only  the  parameters 
defining  the  variations  in  attitude  and  elevation  be  recovered. 


423 


Regarding  the  shape  of  the  earth  a few  authors  have  recommended 
that  an  ellipsoid  of  revolution  be  used  (Pucinelli  [163,  Forrest  [93, 
Levine  [133,  Synder  [193).  Because  of  the  complex  nature  of  the 
resulting  formulas,  no  exact  closed  form  have  been  derived  so  far. 
Computations  on  the  surface  of  the  earth's  ellipsoid  involving  eleva- 
tions as  recommended  by  the  above  authors  require  approximations  and/or 
iterations. 

1.4  Preview  of  the  Investigation 

The  parametric  model  derived  for  this  investigation  is  suitable 
for  Landsat  MSS  type  imagery.  With  slight  modification  of  its  sensor 
dependent  parameters,  this  model  is  also  applicable  to  TM  type  imagery. 
It  is  sufficiently  general  as  to  encompass  various  specific  cases 
published  by  other  researchers.  In  addition,  it  extends  the  modelling 
of  the  satellite  position  to  include  all  of  its  three  components,  while 
others  have  limited  consideration  to  only  one  component,  its  elevation. 
With  this  general  model,  we  are  able  to  both  generate  synthetic  data 
and  study  rectification.  This  model  is  also  used  to  study  the  effect 
on  ground  position  of  target  points  due  to  both  individual  as  well  as 
combined  errors  in  the  various  parameters. 

The  major  factors  affecting  rectification  accuracy  are:  (1)  the 

type  of  model  used,  (2)  density  of  ground  control,  (3)  accuracy  of 
ground  control,  and  (4)  the  accuracy  of  the  derived  image  coordinates 
or  directions.  Using  synthetic  data  produced  using  the  derived  model 
we  studied  the  effects  of  these  different  factors.  The  different  cases 
of  the  model  used  are:  (a)  polynomial  model,  (b)  model  with  straight 

line  orbit  and  earth  surface  projected  on  a plane,  (c)  model  with 
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circular  orbit  and  spherical  earth,  (d)  model  with  circular  orbit  and 
ellipsoidal  earth,  and  (e)  model  with  elliptical  orbit  and  ellipsoidal 
earth.  The  last  three  models  fully  accounted  for  the  satellite  position 
deviation  (three  components)  and  the  sensor  attitude  (three  elements). 


2.  MATHEMATICAL  MODELLING 


2.1  Principles  of  Parametric  Modelling 
Figure  1 shows  the  geometry  involved  in  the  relationship  between 
image  and  object  spaces,  where: 

X Y Z is  the  ground  coordinate  system; 

x*  y'  z‘  is  the  transformed  sensor  coordinate  system  parallel  to 
the  ground  coordinate  system; 

S is  the  satellite  position  defined  by  the  vector 

[Xs  vs  z/; 

p is  the  pixel  position  defined  by  [x^  y^  z^; 

G is  the  pixel  ground  position  defined  by  the  vector 

VG  ^ 

h is  the  elevation  of  G,  and 

N is  the  radius  of  the  prime  vertical  corresponding  to  G. 

Since  the  two  coordinate  systems  are  parallel,  then 

(2.1) 
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where  X is  a scale  factor. 
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Let  the  original  sensor  coordinate  system  be  x y z.  This  coordi- 
nate system  is  not  necessarily  parallel  to  the  ground  system.  Let 
be  the  transformation  which  rotates  x y z into  x'  y'  z'.  Applying  this 
transformation  to  the  original  pixel  coordinates  results  in 

(2.2) 

Substituting  equation  (2.2)  into  equation  (2.1)  produces  the  following 
(2.3) 
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This  equation  is  called  the  collinearity  equation. 

The  process  of  deriving  the  pixel  position  vector  [x  y z ]t  in 

r r r 

the  image  space  from  pixel  row  and  column  numbers  is  called  sensor 
modelling.  The  process  of  defining  the  satellite  position  vector 
CX  Yg  Zslt  in  terms  of  orbital  parameters,  time  and  satellite  position 
deviation  parameters  is  called  orbit  modelling.  Orbit  modelling  plus 
the  process  of  defining  M in  terms  of  the  orbital  parameters,  time, 
satellite  position  deviation  parameters  and  sensor  attitude  parameters 
is  called  platform  modelling. 

Before  we  proceed,  we  will  first  list  without  proof  formulas  from 
related  fields  which  we  will  need  later  in  our  derivations. 

2.2  Formulas  from  Related  Fields 

Orbital  mechanics  provides  us  with  the  necessary  formulas  for 
establishing  the  position  of  satellite  in  orbit.  The  following 
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formulas  assumes  that  the  earth  is  a sphere  of  uniform  mass. 


(2.4a) 

E - es  sin  E = 

/gm6/as3  ts 

(2.4b) 

cos  v = 

(cos  E - )/ ( 1 - es  cos  E) 

(2.4c) 

sin  v = 

/l  - eg2  sin  E/(l  - es  cos  E) 

(2.4d) 

R = 

As  (1  - e$  cos  E) 

(2.5) 

ts  = /As3/GMe  { 

2 tan  ^ (V(l  - es)/(l  + e^)  tan  (v/2)l  - 

(2.6) 

x = 2tt  A 3/GM 
s e 

es  /l  - eg2  sin  ( v/2 )/[ 1 + es  cos  (v/2)]  } 

See  Figure  2 for  aid  in  defining  the  terms: 

in  the  semimajor  axis  of  the  satellite  orbit, 

is  the  eccentricity  of  the  satellite  orbit, 

R is  the  distance  of  the  satellite  from  the  earth's  center, 

v is  the  true  anomaly  defined  as  the  angle  as  viewed  from  the 

center  of  the  earth  between  the  satellite  and  the  point  on  the 

satellite  orbit  nearest  the  earth  (perigee), 

t$  is  time  where  t$  is  zero  at  perigee, 

x is  the  period  of  the  satellite  orbit, 

E is  the  eccentric  anomaly, 

G is  the  gravitational  constant,  and 

M is  the  mass  of  the  earth, 
e 

In  Figure  2,  0 is  the  center  of  the  orbit;  P is  the  perigee;  S is 
the  satellite;  F-j  and  F2  are  the  focii  of  the  elliptical  satellite 
orbit;  F-|  coincides  with  the  center  of  the  earth;  R is  distance  of 
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the  satellite  from  the  earth's  center;  Ag  is  the  semi-major  axis  of 
the  satellite  orbit;  and  v is  the  true  anomaly. 

Given  ts,  the  parameters  Ag  and  es>  and  the  constants  G and  Mg, 
the  polar  coordinates  R and  v of  the  satellite  position  can  be  solved 
for  using  equations  (2.4a)  to  (2.4d).  Equation  (2.4a)  has  to  be  solved 
iteratively  for  E,  the  eccentric  anomaly.  Conversely  given  v and  the 
same  set  of  constants,  tg  can  be  solved  for  using  equation  (2.5). 

The  next  field  where  other  required  formulas  are  available  is 
geometric  geodesy.  The  following  formulas  are  useful  for  computing  on 
the  surface  of  the  earth.  The  major  assumption  here  is  that  the  earth 
is  an  ellipsoid  of  revolution. 

(2.7a)  N = A //T  - e ^ "sin*  4> 

(2.7b)  6N  = ee2  N 

(2.8)  Ravg  = Ae  A - ee2/(l  - £e2  sin2  *) 

(2.9a)  X = (N  + h)  cos  4>  cos  X 

(2.9b)  Y = (N  + h)  cos  $ sin  X 

(2.9c)  Z = (N  + h - 6N)  sin  <j> 

Figures  3a  and  3b  will  help  clarify  the  following  terms: 

Ag  is  the  semi-major  axis  of  the  ellipsoid, 

ca  is  the  eccentricity  of  the  ellipsoid, 

4>  is  the  geodetic  latitude, 

X is  the  geodetic  longitude, 

h is  the  elevation  of  a point, 

N is  the  radius  of  the  prime  vertical, 

<5N  is  that  part  of  N below  the  equator  for  points  in  the 
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northern  hemisphere  and  above  the  equator  for  points  in 
the  southern  hemisphere 

RaVg  is  the  average  radius  of  curvature  of  a point  on  the 
earth's  surface,  and 

f 

[X  Y Z3  is  the  vector  defining  the  position  of  a point. 

Map  projections  is  the  last  area  where  necessary  formulas  can  be 
found.  Although  other  types  of  projection  may  be  applicable,  only  one 
type,  namely,  the  oblique  Mercator  projection,  was  arbitrarily  chosen. 
The  main  assumption  here  is  that  the  earth  is  a sphere. 

sin  (X  - Xp)  cos  <t> 

cos  4>  cos  4>  cos  (X  - X ) - sin  <b  sin  <|> 

» ' r r 


(2.10a)  U = - R tan 


(2.10b)  V = - % R log 


’1  + sin  4>  sin  ^ + cos  cos  4>p  cos  (X  - Xp)‘ 
T - sin  <f>  sin  - cos  <t>  cos  <t>D  cos  (X  - X_) 

r r r 


Figures  4a  and  4b  are  included  for  clarification  of  the  following 
symbols: 

<J)p  and  Xp  are  the  latitude  and  the  longitude  respectively  of  the 
projection  pole  P;  the  projection  pole  is  the  point  of 
intersection  with  the  sphere  of  a line  normal  to  the 
central  circle  and  passing  through  the  earth's  center; 

$ and  X are  the  latitude  and  longitude,  respectively,  of  the 
point  to  be  projected; 

U and  V are  the  resulting  map  coordinates  after  projection;  and 
R is  the  radius  of  the  best  fitting  tangent  sphere  to  the 

earth  surface  at  the  point  of  interest. 
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2.3  Sensor  Modelling 

The  main  purpose  of  sensor  modelling  is  to  recover  the  true 
direction  of  the  pixel  vector  at  the  moment  of  pixel  imaging  with 
respect  to  the  sensor  coordinate  system.  The  sensor  coordinate  system 
is  arbitrary,  but  for  sensors  of  the  scanner  type,  the  following  is  a 
convenient  coordinate  system  (see  Figure  5).  The  origin  coincides 
with  the  perspective  center  of  the  sensor  optical  system;  the  z-axis 
bisects  the  scanning  angle  and  is  positive  away  from  the  object;  the 
y-axis  is  parallel  to,  and  positive  in,  the  scanning  direction  and  it 
is  also  perpendicular  to  the  z-axis;  the  x-axis  completes  a right 
handed  coordinate  system.  In  Figure  5,  0 is  the  origin,  2 a is  the 
scan  angle,  and  the  x-,  y-,  and  z-axes  are  as  shown.  Every  scan  has 
its  own  unique  coordinate  system.  The  pixel  direction  can  be  expressed 
either  as  a unit  vector  or  as  a pair  of  coordinates  in  a plane  per- 
pendicular to  the  z-axis.  In  the  latter  case,  the  z-coordinate  of  a 
pixel  is  always  constant.  We  will  use  the  latter  in  our  derivations. 

Sensor  models  are  derived  for  both  the  multispectral  scanner  (MSS) 
and  the  thematic  mapper  (TM).  Essentially,  from  the  point  of  view  of 
sensor  modelling,  the  MSS  and  the  TM  are  the  same,  except  for  the  fact 
that  the  TM  uses  a scan  line  corrector  to  compensate  for  the  motion  of 
the  satellite  during  scanning.  This  is  necessary  because  unlike  the 
MSS  which  uses  only  the  forward  scan  for  imaging,  the  TM  uses  both  the 
forward  and  the  reverse  scan.  For  both  the  MSS  and  the  TM,  every  frame 
of  imagery  consists  of  a number  of  scans,  every  scan  consists  of  a 
number  of  lines  and  every  line  consists  of  a number  of  samples  or  pixels. 
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The  position  of  a point  in  an  imagery  is  defined  by  its  row  (r) 
and  column  (c)  numbers,  which  are  not  necessarily  integers.  The  column 
number  c has  to  be  corrected  for  the  deviation  of  the  number  of  samples 
in  one  scan  from  the  nominal,  which  is  known  as  the  line  length 
correction,  and  for  the  non-linearity  of  scanning.  The  line  length 
correction  is  applied  by  simply  multiplying  c by  a constant  factor 
resulting  in 


where  Ns  is  the  observed  number  of  samples  in  one  scan,  is  the 
nominal  number  of  samples  in  one  scan,  and  c1  is  the  column  number  with 
line  length  correction  applied.  The  formula  assumes  that  the  scanning 
is  linear  in  time  or  equivalently,  that  the  velocity  of  scanning  is 
constant.  To  correct  for  the  non-linearity  in  scanning,  the  deviation 
of  c'  from  the  nominal  is  modelled  as  a polynomial  series  resulting  in 

(2.12)  Ac'  = aQ  + a-|  c'  + ^ c'2  + a3  c‘3  + a4  c'4  + •••• 

where  c'  is  defined  in  equation  (2.11),  Ac'  is  the  deviation  of  c'  from 
its  correct  value  and  aQ,  a-j,  a^,  a^,  a^,  ....  are  the  coefficients  of 
the  polynomial  series  measured  during  sensor  calibration.  The  final 
column  number  corresponding  to  a point  is  as  follows: 

(2.13)  c"  = c'  + Ac' 

where  c"  is  the  column  number  with  both  the  line-length  and  scanning 
non-linearity  correction  applied. 

For  the  MSS,  the  row  number  r of  a point  needs  no  correction.  For 
the  TM,  the  row  number  is  compensated  for  the  effect  of  the  scan  line 
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corrector.  The  scan  line  corrector  is  an  image  motion  compensation 
device  which  attempts  to  cancel  the  relative  motion  between  the 
satellite  and  the  ground  during  image  acquisition  in  every  scan.  In 
the  TM,  if  no  image  motion  compensation  is  applied,  the  ground  coverage 
of  the  forward  and  the  reverse  scan  will  not  be  parallel.  The 
compensation  for  the  row  number  in  the  forward  scan  has  the  following 
form: 


(2.14a)  Arp  = -~ 


tc"  - 1] 


For  the  reverse  scan,  the  compensation  is  just  the  opposite  for  that  of 
the  forward  scan,  hence. 


(2.14b)  ArR 


[c"  - 1], 


where: 

Arp  and  ArR  are  the  compensations  for  the  row  number  in  the  forward 
and  the  reverse  scans  respectively; 

c"  is  defined  in  equation  (2.13); 

is  the  nominal  number  of  samples  in  one  scan; 

Sp  is  the  distance  travelled,  in  pixels,  of  the  satellite 

ground  track  in  one  scan.  For  aid  in  visualizing  the 
effect  of  the  scan  line  corrector  see  Figures  6a,  6b, 
and  6c. 

The  corrected  row  number  for  both  the  TM  forward  and  reverse  scan  is 
given  by 


r'  = r + Ar 

where  r'  is  the  corrected  row  number  and  Ar  is  either  Arp  or  ArR  as 
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defined  in  equation  (2.14).  The  problem  of  determining  whether  a point 
was  imaged  during  the  forward  or  reverse  scan,  by  the  TM  will  be  discus- 
sed presently. 

We  first  assume  that  an  image  frame  consists  only  of  whole  scans. 
Then  the  scan  number  to  which  a point  belongs  is 


(2.15) 


+ 1 


where  L J means  the  largest  integer  not  exceeding  the  value  inside,  r 
is  the  uncorrected  row  number,  and  is  the  number  of  lines  in  one  scan. 
If  the  first  scan  is  forward,  then  all  odd  scans  are  forward  scans  and 
all  even  scans  are  reverse  scans  and  vice  versa.  The  corrected  line 
number  of  a point,  once  its  scan  number  is  known,  is 

(2.16a)  A = r'  - (i$  - 1)  NL 

and  its  corrected  sample  number  is  equal  to  the  corrected  column  number, 
that  is 

(2.16b)  s = c" 

where  Z and  s are  the  corrected  line  and  sample  numbers  of  a point, 
respectively;  and  r'  and  c"  are  the  corrected  row  and  column  numbers, 
respectively. 

The  direction  of  a pixel  vector  with  respect  to  sensor  system  can 
now  be  expressed  in  terms  of  a and  s.  In  Figure  7,  a is  proportional 
to  s,  that  is, 

a = ur^T-T  (s  " ])  - 7 ’ 
and  B is  proportional  to  A,  that  is, 
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8 = n^“=“T  - 1}  " f » 

where  \Jj  and  0 are  the  total  sensor  angular  coverages  across  and  along 
satellite  track,  respectively.  Also  in  Figure  7,  p is  the  point  on  the 
plane  of  the  imagery  (this  plane  is  really  part  of  a cylinder);  p'  is 
its  projection  on  a plane  perpendicular  to  z;  c is  the  principal 
distance  of  the  sensor  optical  system;  and  x',  y‘  are  the  coordinates 

r r 

of  point  p'  on  the  plane  perpendicular  to  z. 

From  Figure  7,  the  following  relations  are  written 

(2.17a)  y^  = c tan  a, 

(2.17b)  Xp  = tan  g = c sec  a tan  3, 

and 

(2.17c)  z'  = - c. 

These  expressions  for  the  coordinates  of  the  pixel  position  projected  on 
a plane  is  the  objective  of  sensor  modelling. 

2.4  Platform  Modelling 

In  platform  modelling,  first  an  expression  for  the  position  of  the 
satellite  in  the  ground  coordinate  system  is  derived.  Then,  a trans- 
formation is  defined  which  makes  the  ground  coordinate  system  parallel 
to  the  sensor  coordinate  system.  Once  these  are  done,  the  satellite 
collinearity  equation  (equation  (2.3))  is  then  readily  derived. 

The  position  of  the  satellite  in  terms  of  the  ground  coordinate 
system  can  be  defined  in  at  least  three  ways.  The  first  expresses  the 
satellite  position  in  terms  of  its  position  vector  ft.  This  approach 
requires  that  the  satellite  orbit,  needed  for  defining  the  sensor 
attitude,  be  expressed  in  terms  of  ft  and  the  velocity  vector  ft.  The 
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weakness  of  this  approach  is  that  we  must  express  six  variables  as 
unknown  functions  of  time,  three  for  the  components  of  ft  and  three 
for  the  components  of  ft,  resulting  in  models  with  very  weak  geometry. 

The  usual  solution  for  this  shortcoming  is  to  assume  that  ft  and  ft  are 
known  a-priori. 

The  second  approach  assumes  that  the  parameters  defining  the 
satellite  orbit  are  themselves  functions  of  time.  In  this  case,  we 
must  also  express  six  variables  as  unknown  functions  of  time.  As  in 
the  first  approach,  the.  resulting  model  geometry  is  also  very  weak.  One 
common  solution  for  this  problem  in  this  case  is  to  assume  some  of  the 
parameters  as  fixed  or  known  a-priori. 

The  third  approach  assumes  that  the  parameters  defining  the 
satellite  orbit  are  independent  of  time.  Once  the  orbit  is  defined 
using  nominal  parameters,  the  nominal  position  of  the  satellite  in  the 
orbit  plane,  specifically  the  instantaneous  R and  the  true  anomaly,  v, 
can  be  defined  using  equations  (2.4)  and  (2.5),  if  the  orbit  is  assumed 
to  be  elliptical.  If  the  orbit  is  assumed  to  be  circular,  the  satellite 
position  can  be  defined  using  equation  (2.6)  where  Ag  is  made  equal  to 
the  radius  of  the  circular  orbit.  This  approach  requires  that  the 
three  components  of  the  small  deviation  of  the  actual  satellite  position 
from  the  predicted  position  using  nominal  orbital  parameters  be  modelled 
as  functions  of  time.  Compared  to  the  previous  two  approaches  which 
required  that  six  parameters  be  expressed  as  functions  of  time,  the  last 
approach  results  in  a much  stronger  geometry.  Therefore,  this  last 
approach  is  used  in  the  derivations  of  the  selected  model. 

The  three  components  of  the  deviation  of  the  satellite  from  its 
nominal  position,  are  defined  as  follows:  AR  is  the  component  parallel 
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to  and  in  the  same  direction  as  the  position  vector  ft  of  the  satellite; 

AG  is  in  the  plane  of  the  nominal  orbit,  perpendicular  to  AR  and 
positive  in  the  direction  of  satellite  motion;  and  AP  is  perpendicular 
to  the  orbital  plane.  The  set  AG,  AP,  AR  forms  a right  handed  coordinate 
system.  Since  these  components  are  small,  they  can  be  modelled  quite 
well  by  the  following  polynomial  series: 

(2.18a)  AG  = GQ  + G1  (t  - tp)  + G2  (t  - tp)2  + 

(2.18b)  AP  = PQ  + P1  (t  - tp)  + P (t  - tp)2  + .... 

(2.18c)  AR  = RQ  + R]  (t  - tp)  + R2  (t  - tp)2  + .... 

where  Gg,  G-j , G2,  .....  PQ,  P-j,  P2>  RQ,  R-j,  R2,  are  coef- 

ficients of  the  corresponding  polynomial  terms;  t is  time,  tp  is  the 
time  at  the  center  of  the  frame;  and  t is  zero  at  the  ascending  node. 

The  ground  coordinate  system  used  is  the  geocentric  system  where 
the  origin  is  the  center  of  the  earth,  the  X-axis  passes  through  Green- 
wich meridian  at  the  equator,  the  Z-axis  is  parallel  to  the  rotational 
axis  of  the  earth  and  the  Y-axis  completes  the  right  handed  coordinate 
system.  This  coordinate  system  rotates  with  the  earth.  We  define  our 
inertial  coordinate  system  to  coincide  with  the  ground  coordinate  system 
when  the  satellite  is  at  the  ascending  node,  that  is,  when  the  satellite 
crosses  the  plane  of  the  earth's  equator  while  travelling  from  south  to 
north.  The  only  difference  between  the  ground  coordinate  and  the 
inertial  coordinate  systems  is  that  while  the  former  rotates  with  the 
earth,  the  latter  maintains  a constant  angle  with  the  projection  of  the 
earth-sun  line  on  the  earth's  equatorial  plane.  This  convention  regard- 
ing the  inertial  coordinate  system  results  in  a plane  orbit  in  this 
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coordinate  system  for  sun-synchronous  satellites  such  as  Landsat.  In 
Figure  8,  X Y Z is  the  ground  coordinate  system,  and  X1  Y1  Z1  is  the 
inertial  coordinate  system.  In  the  same  figure: 

A is  the  ascending  node; 

P is  the  perigee  (the  point  in  the  satellite  orbit 

nearest  the  earth); 

S is  the  satellite; 

n is  the  longitude  of  A with  respect  to  the  inertial 

coordinate  system; 

i is  the  inclination  of  the  satellite  orbit; 

a)  is  the  argument  of  the  perigee; 

v is  the  true  anomaly; 

R is  the  radial  distance  of  the  satellite  from  center; 

a)g  is  the  angular  velocity  of  the  earth; 

t is  the  time  (t  = 0 when  the  satellite  is  at  the 

ascending  node);  and 

AG,  AP,  AR  are  the  deviations  of  the  satellite  from  its  nominal 
position. 

To  define  the  satellite  position  in  the  ground  coordinate  system 
we  have  to  perform  a series  of  rotations  on  the  ground  coordinate 
system.  The  first  such  rotation  is  around  the  Z axis  which  brings  the 
ground  coordinate  system  into  the  inertial  coordinate  system  resulting 
in 


(2.19) 


V" 

- cos  (-  we  t) 

sin  (-  t)  0" 

"X" 

Y1 

= 

-sin  (-  ue  t) 

cos  (-  we  t)  0 

Y 

0 

0 L 

437 


The  second  rotation  is  around  the  Z^axis  to  make  the  X^axis  coincide 


with  the 

line 

of 

apsides  (passes  through  A)  which 

V 

cos  (ft)  sin  (ft)  0 

"xr 

(2.20) 

Y1 

= 

-sin  (ft)  cos  (ft)  0 

Y1 

z’_ 

0 0 1_ 

Substituting  equation  (2.19)  into  equation  (2.20),  we  get 


~xr 

"X" 

cos  (ft  - we  t)  sin  (ft  - t)  0 

(2.21) 

Y1 

Y 

= 

-sin  (ft  - uig  t)  cos  (ft  - t)  0 

Z1 

_Z_ 

0 0 1_ 

The  third  rotation  is  around  the  X^-axis  by  the  angle  ( tt/2  + i),  see 
Figure  9,  or 


V 

'xr 

" 1 

0 

0 

(2.22) 

Y2 

_Z2_ 

= M2 

Y1 

Z1 

= 

0 

_ 0 

cos  (tt/2  + i) 
-sin  (tt/2  + i) 

sin  (tt/2  + i) 
cos  (tt/2  + i)_ 

2 ? 2 
The  X - and  the  Z -axes  lie  on  the  orbit  plane  while  the  Y-axis  is 

2 

perpendicular  to  it.  The  next  rotation  is  around  the  Y -axis  such  that 


438 


2 

the  Z -axis  passes  through  the  satellite  position  that  is  corrected 
for  the  radial  (AR)  and  orbital  (AG)  deviations.  The  resulting 
equations  are 


X3 

"x2“ 

(2.23a) 

Y3 

_Z3_ 

* M3 

Y2 

_Z2_ 

“cos  (tt/2  + w + v + 0g) 
0 

sin  (tt/2  + oj  + v + 0g) 

-sin  (tt/2  + w + v + 0g)' 
0 

cos  (tt/2  + or  + v + 0q) 


This  can  be  seen  more  clearly  in  Figure  10  which  shows  the  orbital  plane 
only;  w and  v were  defined  previously;  Rq  is  the  magnitude  of  the 
vector  sum  of  ft,  Aft,  and  Aft.  The  angle  0q  which  corrects  for  the 
deviation  of  the  satellite  along  the  radial  (AR)  and  orbital  (AG) 
direction  is  defined  as  follows: 


(2.23b)  eG  = tan-1  (r£af) 

The  last  rotation  needed  to  define  the  satellite  position  in  the  ground 
coordinate  system  corrects  for  the  deviation  of  the  satellite  position 
perpendicular  to  the  satellite  orbit  (AP).  In  Figure  11,  S'  is  the 
actual  satellite  position,  then 

(2.24a)  Rq  = /(R  + AR)2  + AG2 

(2.24b)  R'  = /(R  + AR)2  + AG2  + AP2 

and 

(2.24c)  0p  = tan"1  (£^) 

G 
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r 


r 


3 3 3 3 

Rotating  around  the  X -axis  by  - 0p  brings  the  X Y Z coordinate 

system  into  the  Xs  Ys  Zs  coordinate  system.  The  set  of  equations 

resulting  from  this  rotation  is 


Xs 

X3 

“1 

0 

0 

X3 

(2.24d) 

YS 

_zs_ 

’ M4 

Y3 

_Z3_ 

= 

0 

0 

cos  (-  0p) 
-sin  (-  0p) 

sin  (-  0p) 
cos  (-  0p) 

Y3 

_Z3_ 

The  Xs  Ys  Zs  is  the  satellite  coordinate  system.  The  origin  of  the 
system  is  still  the  center  of  the  earth,  the  Zs-axis  passes  through 
the  actual  satellite  position,  the  Xs-axis  is  parallel  to  the  nominal 
satellite  orbit  and  positive  in  the  direction  of  satellite  motion  and 
the  Ys-axis,  which  is  not  necessarily  perpendicular  to  the  nominal  • 
satellite  orbit,  completes  the  right  handed  system. 

Collecting  equations  (2.21),  (2.22),  (2.23a),  and  (2.24d)  together 
we  get. 


~xs~ 

~x~ 

“X" 

(2.25) 

YS 

_ZS_ 

= M4  M3  Mg  M-j 

Y 

_Z_ 

= Ms 

Y 

_Z_ 

Since  M-j,  Mg,  M^,  and  M^  are  all  orthogonal  matrices,  Mg  is  also 
orthogonal.  It  can  be  seen  in  Figure  11  that  the  vector  which  defines 
the  actual  satellite  position  in  the  Xs  Ys  Zs  coordinate  system  is 
[0  0 R ' ] ^ where  R'  is  defined  in  equation  (2.24b).  Therefore  the 
position  of  the  satellite  in  the  ground  coordinate  system  is 


V 

1 

l°_ 

Ye 

= M T 

0 

s 

S 

2 

_R'_ 

s__ 

(2.26) 
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Once  the  satellite  position  in  terms  of  the  ground  coordinate 
system  is  defined,  the  next  step  in  platform  modelling  is  to  define 
the  transformation  M,  which  makes  the  ground  coordinate  system  parallel 
to  the  sensor  coordinate  system.  Since  the  transformation  Ms>  which 
brings  the  ground  coordinate  system  into  the  satellite  coordinate  system 
is  already  defined  (see  equation  2.25),  we  only  have  to  derive  the 
transformation  which  brings  the  satellite  coordinate  system  into  the 
sensor  coordinate  system.  This  latter  transformation  consists  of  a 
series  of  rotations  which  correct  for  the  fact  that  the  vertical  does 
not  pass  through  the  center  of  the  earth  and  which  properly  account  for 
the  attitude  of  the  scanner  coordinate  system. 

In  Figure  12,  the  relative  orientation  between  the  satellite 
coordinate  system  Xs  Ys  Zs  and  the  ground  coordinate  system  X Y Z is 
shown.  In  the  same  figure. 


S' 

R* 

6S 

Ns 

SN_ 


is  the  ground  track  of  the  satellite  S; 

is  the  distance  of  the  satellite  from  the  center  of 

the  earth; 

is  the  latitude  of  the  satellite; 
is  the  radius  of  the  prime  vertical; 
is  that  part  of  the  prime  vertical  below  the  equator 
for  points  in  the  northern  hemisphere  and  above  the 
equator  for  points  in  the  southern  hemisphere; 


5Zs  = 6Ns  x 

sin  e is  the  projection  of  <5N  on  the  Z-axis;  and 
hs  is  the  elevation  of  the  satellite. 
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The  prime  vertical  Ng  and  the  elevation  hg  form  a straight  line  which 
represents  the  vertical  that  passes  through  the  satellite.  It  can  be 
seen  that  the  vertical  does  not  pass  through  the  center  of  the  earth. 

It  is  necessary  to  compensate  for  the  non-coincidence  of  the 
vertical  with  center  of  the  earth  because  the  vertical  is  the  nominal 
direction  of  the  z-axis  of  the  sensor  coordinate  system  as  previously 
defined.  This  compensation  can  be  done  by  making  the  Zs-axis  parallel 
to  the  vertical  or  equivalently  by  making  the  Zs-axis  pass  through  a 
point  whose  position  is  defined  by  the  sum  of  the  vectors  ft'  and 
[0  0 6Z$]t.  The  vector  [0  0 SZ^  is  a function  of  the  satellite 
latitude  0$  which  in  turn  is  related  to  the  satellite  coordinates  Xs, 

Y , Zg  via  equation  (2.9).  This  can  be  seen  more  clearly  in  Figure  13 
which  is  a simplified  version  of  Figure  12. 

To  define  the  angular  rotations  necessary  for  making  the  Zs-axis 
parallel  to  the  vertical,  we  first  have  to  transform  the  vector 
[0  0 6Zslt  into  the  satellite  coordinate  system  Xs  Ys  Zs.  The  result 
of  the  transformation  using  equation  (2.25)  is 


6XS 

s 

0 

5YS 

= M_ 

0 

s 

s 

SZS 
_ s_ 

_6Zs_ 

The  elements  in  equation  (2.27)  are  also  shown  in  Figure  13. 

The  first  rotation  to  make  the  Zs-axis  parallel  to  the  vertical 
is  around  the  Y -axis  by  the  angle  0 (see  Figure  14)  which  results  in 

A 
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CJ1 

1 

Xs 

“cos  0X 

0 -sin  0 ~ 

(2.28a) 

Y5 

- M5 

YS 

= 

0 

1 0 

_Z5_ 

_ZS_ 

sin  e„ 

0 cos  0„ 

_ X 

x_ 

where 

(2.28b) 


0 


x 


tan 


R'  + SZg 


R'  is  the  radius  of  the  satellite  defined  in  equation  (2.24b),  and 

SX*,  SZ^  are  defined  in  equation  (2.27). 

5 

The  second  rotation  is  around  the  X-axis  by  the  angle . 0 (see 
Figure  15)  such  that 


"x6' 

V 

"1 

0 

0 ‘ 

_x5" 

(2.29a) 

V6 

' M6 

Y5 

= 

0 

cos  (-  ey) 

sin  (-  0y) 

Y5 

z6_ 

_Z5_ 

0 

-sin  (-  0J 

cos  (-  0,,) 

Z5 



y 

y J 

where 


(2.29b)  R"  = /(R’  + 6ZSS)Z  + (6  X*)2  , 

-1  5YI 

(2.29c)  6y  = tan  (-£#-)  , 

R',  6X^,  6Z^  are  the  same  as  in  equation  (2.28)  and  SY^  is  defined  in 
equation  (2.27). 

After  making  the  Zs-axis  of  the  satellite  coordinate  system  Xs  Ys 
Zs  parallel  to  the  vertical,  we  then  have  to  account  for  the  attitude 
of  the  sensor  coordinate  system  during  pixel  imaging.  This  is  done 
through  a series  of  sequential  rotations  to  correct  for  the  roll  w,  the 
pitch  <j>,  and  the  yaw  k,  applied  in  that  order.  The  first  rotation  is 
that  due  to  the  roll  to,  resulting  in 
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(2.30) 


~xw 

"xr 

"1 

0 

0 

- 

~x6‘ 

3 

>- 

* 

y6 

= 

0 

cos  to 

sin 

to 

Y6 

[z“_ 

A 

_0 

-sin  to 

cos 

to 

A 

The  next  rotation  is  to  compensate  for  the  pitch  4>,  such  that 


"cos  4> 

0 

-sin 

>“ 

(2.31) 

y4> 

■ 

ytO 

= 

0 

1 

0 

Y to 

_Z*_ 

_zw_ 

_sin  <J> 

0 

cos 

<t>_ 

_z“_ 

The  last  rotation  which  accounts  for  the  yaw  K,  produces  the  following 
set  of  equations: 


“X'“ 

X0 

cos  ic  sin  k 

0" 

X* 

(2.32) 

Y' 

- \ 

Y^ 

= 

-sin  ic  cos  k 

0 

Y<t> 

V_ 

_Z*. 

0 

1_ 

_Z*_ 

Since  each  pixel  has  its  own  unique  attitude,  we  have  to  para- 
meterize its  components  to,  <j>,  k in  terms  of  time  in  a similar  manner 
to  what  was  previously  done  to  the  components  of  the  deviation  of  the 
satellite  position.  We  also  selected  in  this  case  polynomials, 
resulting  in: 

(2.33a)  to  = Wq  + (t  - tp)  + (t  ^ tp)2  + tOg  (t  - tp)3  + 

(2.33b)  = <t>Q  + 4>1  (t  - tp)  + 4>2  (t  - tp)2  + 4>3  (t  - tp)3  + .... 

(2.33c)  K = Kg  + K-|  (t  - tp)  + (t  - tp)2  + Kg  (t  - tp)3  + .... 

where  t is  time  which  is  zero  at  the  satellite  ascending  node  and  tp 

is  the  time  of  imaging  of  the  frame  center. 
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Combining  equations  (2.28a),  (2.29a),  (2.30),  (2.31),  and  (2.32) 
results  in 


"X'~ 

Xs“ 

Xs 

Y' 

= M M.M  Mc 

YS 

= M 

YS 

ic  4>  to  6 5 

a 

_Z'_ 

1 ^ 

_ZS_ 

All  the  matrices  involved  in  equation  (2.34)  are  orthogonal.  Sub- 
stituting equation  (2.25)  into  equation  (2.34)  gives 


"X'~ 

"X" 

~X" 

(2.35) 

Y' 

_Z'_ 

= M M 
a s 

Y 

_Z_ 

= M 

Y 

_Z_ 

The  coordinate  system  X'  Y'  Z‘  with  origin  at  the  center  of  the  earth  is 
parallel  to  the  sensor  coordinate  system  x y z.  The  derivation  of  M and 
the  previous  derivation  of  the  satellite  position  vector  CXS  Yg  Z ]* 
completes  platform  modelling. 

2.5  Combined  Sensor/Platform  Model  and  Applications 
The  sensor  and  the  platform  models  were  derived  independently  of 
each  other.  A convenient  method  of  relating  them  is  to  express  at 
least  some  quantities  involved  in  the  platform  model  as  functions  of 
position  of  points  in  the  imagery.  Since  pixel  imaging  is  done 
sequentially  with  respect  to  time,  it  follows  that  pixel  positions  are 
also  functions  of  time.  We  may  then  reverse  the  relationship  and 
express  time  as  a function  of  pixel  positions.  Furthermore,  since  some 
of  the  parameters  in  the  platform  model  are  functions  of  time,  these 
parameters  are  also  functions  of  pixel  position.  Thus,  we  are  able  to 
relate  the  platform  model  to  the  sensor  model. 
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A convenient  expression  for  time  in  terms  of  the  pixel  position 
for  the  MSS  and  for  the  odd  scans  of  the  TM  is, 

At.  At  N 

(2.36a)  t = tF  + 2 (is  - 1)  + (c  - 1)  Tjr-^T  - At£ 

For  the  even  scans  of  the  TM,  the  corresponding  expression  is. 

At  At  N 

(2.36b)  t = tp  + [2  (i$  -1)  + 1]  — — + (c  - 1)  - "j  — >r  At£ 

Terms  in  both  equations  are  defined  as  follows: 


np 


Ns 

At. 


At. 


is  the  elapsed  time  which  is  zero  at  the  satellite  ascending 
node; 

is  the  time  of  imaging  of  the  pixel  center  (approximate); 

is  the  scan  line  number  to  which  a pixel  belongs; 

is  the  uncorrected  pixel  column  number; 

is  the  nominal  number  of  pixels  in  one  scan; 

is  the  actual  number  of  pixels  in  one  scan; 

is  the  number  of  scans  in  one  frame; 

is  the  sensor  cycling  time;  and 

is  the  one  active  scanning  interval  of  the  sensor. 


If  the  odd  scan  for  TM  is  the  reverse  scan  (Np  - c + 1)  should  be 
substituted  for  c in  equation  (2.36a)  and  if  the  even  scan  is  the 
reverse  scan  (Np  - c + 1)  should  be  substituted  for  c in  equation  (2.36b), 
The  main  assumption  in  equation  (2.36)  is  that  all  pixels  in  one  column 
for  a given  scan  are  sampled  simultaneously. 

The  combined  sensor  and  platform  model  is  expressed  by  the 
satellite  collinearity  equation  given  in  Section  2.1,  or 
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sensor  modelling  defined  the  vector  [Xp  yp  z ]*  in  terms  of  the  pixel 
image  row  and  column  numbers.  Platform  modelling  defined  the  satellite 
position  vector  [X$  Ys  Zslt  and  the  orthogonal  matrix  M in  terms  of  the 
orbit  parameters,  satellite  position  deviation  parameters,  attitude 
parameters,  and  time.  Then  equation  (2.36)  related  the  sensor  and 
platform  models  by  defining  time  in  terms  of  image  pixel  position. 

The  satellite  col  linearity  equation  can  be  used  for  producing 
simulated  data  useful  for  studying  rectification.  For  this  application 
equation  (2.1)  is  inverted  to  the  form 


(2.37) 


fX-l 

rxi 
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+ 
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1 

Using  equation  (2.37),  the  ground  position  Xg,  Yg,  Zg  of  a pixel  can  be 
solved  for  given  the  following:  the  pixel  row  and  column  number  in  the 

image;  the  satellite  orbit  parameters  fl,  i,  w,  Ag,  and  es;  the 
parameters  defining  the  satellite  position  deviation  components  AG,  AP, 
AR;  the  parameters  defining  the  sensor  attitude  components  w,  <j>,  k; 
the  parameters  defining  time  (tp,  Atc,  Atg);  the  sensor  constants  N , 
N„,  N , c and  the  scanning  non-linearity  correction  constants;  the 
earth  related  constants  A , e , w , G,  M ; and  the  elevation  h of  the 

c c c c 

point.  This  procedure  will,  in  effect,  give  us  pixels  whose  ground 
positions  are  perfectly  known. 
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For  rectification,  the  original  form  of  the  satellite  collinearity 
equation  (equation  2.1)  is  used.  The  vector  [Xp  yp  Zp]fc  is  first  com- 
puted using  the  sensor  calibration  constants  and  the  pixel  image  row  and 
column  numbers;  this  vector  is  considered  as  the  observation  in  the 
subsequent  adjustment  procedure  applied.  Then  the  right  hand  side  of 
equation  (2.1)  is  linearized  in  terms  of  the  parameters  defining 
satellite  deviation  components,  the  parameters  defining  time,  and  the 
ground  coordinates.  The  ground  coordinates  are  considered  either  as 
constants  or  as  observations.  The  orbit  parameters  are  estimated  using 
a-priori  information  and  assumed  constant  because  effects  of  errors  in 
their  a-priori  estimates  are  compensated  for  by  the  parameters  defining 
the  satellite  position  deviation.  Using  control  points  with  known  image 
and  ground  position,  the  unknown  parameters  are  solved  for  in  an  adjust- 
ment procedure.  Any  a-priori  information  regarding  the  unknown  para- 
meters can  be  incorporated  into  the  adjustment  using  the  proper 
variance-covariance  matrices. 

3.  ACCURACY  STUDIES  USING  SYNTHETIC  DATA 

3.1  Effect  of  Parameter  Perturbations 
Essentially,  all  rectification  methods  require  that  we  have  know- 
ledge of  the  values  of  the  parameters  of  the  model  being  utilized. 

These  parameters  can  be  estimated  using  ground  control  points  or  they 
can  be  independently  observed  or  both.  Once  these  parameters  are  known, 
the  ground  position  of  pixels  can  be  readily  computed.  Rectification 
accuracy,  therefore,  is  directly  affected  by  the  accuracy  of  the  para- 
meter values. 
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One  application  of  equation  (2.37),  which  is  the  form  of  the 
satellite  collinearity  equation  suited  for  simulation,  is  for  computing 
the  effect  on  pixel  ground  position  of  perturbations  on  the  nominal 
values  of  the  parameters.  The  effect  on  pixel  ground  positions  of 
perturbation  applied  to  a single  parameter  can  be  seen  in  Table  I.  In 
this  table,  the  tabulated  values  are  the  individual  perturbations;  the 
resulting  root  mean  square  displacements  in  pixel  ground  position  result- 
ing from  each  individual  perturbation  is  shown  in  the  heading.  It  can 
be  seen  that  within  the  range  of  values  of  interest,  the  resulting  dis- 
placement varies  linearly  with  the  applied  perturbations  for  all  the 
parameters  listed. 

Also  listed  in  Table  I are  the  present  accuracies  of  some  indepen- 
dently observed  parameters  for  the  MSS  and  the  TM  together  with  the 
ground  displacements  (in  brackets)  produced  by  their  standard  deviations. 
It  can  be  seen  that  for  the  MSS,  inaccuracies  in  the  observed  values  of 
roll  (u)  and  pitch  (<}>)  produced  the  largest  ground  displacement  followed 
by  errors  in  the  satellite  position  deviation  parameter  along  the  orbital 
plane  (G)  and  in  the  sensor  cycling  time  (Atc). 

Table  II  shows  the  ground  displacements  when  all  the  parameters  are 
perturbed  simultaneously.  A set  of  perturbations  corresponds  to  a 
column  in  Table  I and  is  represented  in  the  left  column  of  Table  II  by 
the  ground  displacement  produced  by  the  individual  parameters.  Note 
that  each  perturbation  in  the  set  produces  identical  ground  displacements 
when  applied  individually.  The  resulting  ground  displacements  due  to  the 
combined  perturbations  are  tabulated  in  the  right  column. 


3.2  Comparison  of  Different  Mathematical  Models 

One  factor  which  affects  the  accuracy  of  rectification  is  the  type 
of  model  used.  By  its  very  nature,  the  geometry  of  the  satellite 
imagery  is  very  weak.  Because  of  this,  even  the  best  models  presently 
existing  do  not  allow  for  the  recovery  of  parameters  defining  the 
satellite  position  deviation  components  and  the  attitude  elements  at 
the  same  time.  The  model  we  proposed  in  Section  2 is  capable  of  recover- 
ing all  of  these  parameters  at  the  same  time  with  one  exception;  in- 
stead of  the  satellite  position  deviation  component  along  the  orbit,  we 
recover  the  time  of  imaging  of  the  frame  center.  Both  of  these  para- 
meters cause  the  frame  to  be  displaced  along  the  orbit  and  for  small 
deviations,  one  can  satisfactorily  take  the  place  of  the  other. 

We  used  five  models  in  our  test.  They  are:  (1)  the  full  model  in 

Section  2 which  assumes  that  the  earth  is  an  ellipsoid  of  revolution  and 
that  the  orbit  of  the  satellite  is  an  ellipse;  (2)  the  same  model  in 
number  (1)  except  that  the  orbit  of  the  satellite  is  assumed  a circle 
instead  of  an  ellipse;  (3)  the  same  model  in  number  (2)  with  the 
additional  assumption  that  the  earth  is  a sphere;  (4)  the  model  used 
for  aircraft  scanner  data  which  assumes  that  the  orbit  is  a straight 
line  and  requires  that  the  earth  be  projected  on  a mapping  plane;  and 
(5)  the  polynomial  interpolative  model.  Two  cases  are  run  for  each  model. 

The  results  for  two  cases  are  shown  in  Table  III.  Case  I assumes 
that  there  is  no  error  in  identifying  the  control  points  on  both  the 
image  and  on  the  ground,  and  that  there  is  no  error  in  the  derived  or 
measured  point  position  in  both  the  image  and  the  ground.  There  are  156 
control  and  156  check  points  that  are  both  well  distributed.  Case  R 
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assumes  :<  that  there  is  no  error  in  identifying  control  points  on  the 
ground;  that  the  standard  deviation  of  the  measured  ground  position  of 
control  points  in  each  of  the  axes  is  15  m resulting  in  26  m standard 
deviation  when  combined  (21  m in  plan);  that  the  error  in  identifying 
points  in  the  image  is  uniformly  distributed  from  -0.5  to  0.5  pixel 
with  the  resulting  standard  deviation  of  0.28  pixel  in  both  across  and 
along  scan  direction,  and  that  the  errors  in  the  derived  position  of 
points  in  the  image  due  to  sensor  instabilities  not  including  identifica- 
tion errors  are  .01  and  .5  pixel  in  the  across  and  along  scan  directions 
respectively.  The  total  error  in  position  of  points  across  and  along 
scan  are  .29  pixel  (23  m)  and  .58  pixel  (34  m)  respectively;  the 
combined  error  is  41  m. 

Since  the  data  for  Case  I are  perfect,  the  resulting  standard 
deviation  in  both  the  control  and  check  points  can  be  considered  as 
systematic  errors  caused  by  inadequate  model.  Table  III  shows  that  only 
the  last  two  models  are  inadequate  in  describing  the  geometry  of  the 
imagery.  Case  R,  however,  shows  that  if  the  errors  in  both  the  image 
and  ground  position  of  points  are  not  appreciably  smaller  than  the 
systematic  error  introduced  by  the  model,  there  is  really  no  advantage 
in  using  more  sophisticated  ones. 

3.3  Effect  of  Different  Control  Densities 

Another  factor  which  affects  the  accuracy  of  rectification  is  the 
number  or  density  of  control.  This  experiment  simply  involves  the 
varying  of  the  number  of  control  points  in  the  two  cases  (I  and  R)  studied. 
The  model  used  in  both  cases  is  Model  (1)  in  Section  3.2.  The  assumptions 
regarding  the  accuracy  of  derived  or  measured  position  of  points  on  both 
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the  image  and  the  ground  in  Section  3.2  for  Case  I and  Case  R apply  in 
this  section  as  well. 

The  results  are  shown  in  Table  IV.  For  Case  I where  the  position 
of  points  in  both  the  image  and  the  ground  are  perfect,  whenever  the 
number  of  equations  (2  per  control  point)  exceeds  the  number  of  unknown 
parameters  (19  in  this  case)  rectification  is  almost  perfect.  Case  R 
shows  that  any  increase  in  the  density  of  control  points  after  a certain 
number  is  reached  (approximately  25  points  in  this  case)  results  only  in 
a marginal  increase  in  rectification  accuracy. 

3.4  Effect  of  Different  Control  Point  Ground  Position  Accuracy 

The  next  factor  that  significantly  affects  rectification  accuracy 
is  the  accuracy  of  the  measured  ground  position  of  control  points.  We 
assume  that  there  is  no  identification  error  of  control  points  on  the 
ground;  only  measurement  errors  of  ground  position.  Again,  two  cases 
are  involved.  Cases  I and  R.  Both  cases  use  Model  (1)  in  Section  3.2  for 
rectification.  Case  I has  156  control  points  while  Case  R has  only  25. 
Again,  the  assumptions  for  Cases  I and  R in  Section  3.2  regarding  the 
position  of  points  in  the  image  apply  in  this  case. 

Table  V shows  the  effect  of  varying  the  accuracy  of  control  points 
ground  position  for  both  cases.  In  Case  I where  image  position  is 
perfect,  roughly  80%  of  the  error  in  the  ground  position  of  control 
points  is  compensated  for  by  the  rectification  process.  In  Case  R 
decreasing  the  standard  deviation  of  control  point  ground  position  below 
that  of  the  corresponding  standard  deviation  in  the  image  will  not 
increase  rectification  accuracy. 
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3.5  Effect  of  Derived  Image  Position  Accuracy 
During  the  imaging  process,  the  direction  of  the  ray  which  produced 
the  image  of  a given  point  is  defined  in  the  sensor  coordinate  system. 
The  accuracy  with  which  we  can  reconstruct  this  direction  in  the  sensor 
coordinate  system  depends  on  the  accuracy  of  the  identification  of  point 
in  the  image  and  the  geometric  stability  of  the  sensor. 

Table  VI  shows  the  effect  of  image  position  errors  on  rectification 
accuracy.  Again,  two  cases  are  presented.  Both  cases  use  Model  (1)  in 
Section  3.2  as  the  rectification  model.  Case  I has  156  points  and  Case 
R has  25  points.  The  assumptions  regarding  the  accuracy  of  ground 
position  of  control  points  in  Section  3.2  apply  here  as  well. 

It  can  be  seen  from  Table  VI  that  only  a very  small  percentage  of 
errors  in  the  image  position  is  compensated  for  by  the  rectification 
process.  This  is  true  for  both  Cases  I and  R. 

4.  CONCLUSIONS  AND  RECOMMENDATIONS 

1.  It  is  possible  to  recover  all  parameters  defining  satellite  position 
deviation  and  sensor  attitude  using  appropriate  models. 

2.  Uncertainties  in  the  roll  (u)  and  the  pitch  (4>)  of  the  sensor 
contribute  the  greatest  errors  in  system  corrected  images  followed 
by  uncertainties  in  the  satellite  position  along  the  orbit  and  the 
sensor  cycling  time. 

3.  Polynomial  models  and  those  that  assume  that  the  orbit  is  a straight 
line  and  that  require  the  projection  of  the  earth's  surface  on  a 
mapping  plane  cannot  produce  rectification  accuracies  better  than 
half  a pixel . 
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4.  Marginal  increase  in  rectification  accuracy  results  by  increasing 
the  number  of  control  points  above  25. 

5.  A large  percentage  of  errors  in  ground  position  of  control  points 
is  compensated  for  by  the  rectification  process. 

6.  A very  small  percentage  of  error  in  image  position  is  compensated 
for  by  the  rectification  process. 

7.  Sub-pixel  rectification  is  possible  only  if  points  on  the  image  can 
be  identified  to  sub-pixel  accuracies. 

8.  Improving  the  identification  accuracy  of  points  on  the  image  is 
worth  further  investigation  since  rectification  accuracy  is  highly 
sensitive  to  this  error. 

9.  With  the  sensor/platform  model  now  available,  several  other  registra- 
tion/rectification problems  can  be  researched.  These  include:  (1) 

investigation  of  image  correspondence;  (2)  study  of  different 
control  types,  such  as  points,  areas,  relative  control,  and  use  of 
geometric  constraints;  and  (3)  analysis  of  the  optimum  registra- 
tion/rectification sequence. 

10.  Other  fundamental  research  areas  within  the  general  problem  of 
registration/rectification  of  remote  sensing  data  include:  (a) 

accuracy  measures;  (b)  reduction  (photogrammetric)  of  multiple 
spatial  coverage  with  the  same  and  different  sensors;  and  (c) 
efficient  means  of  rectification  of  sensor  data  to  digital  terrain 
models. 
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TABLE  1 EFFECT  OF  PERTURBATION  ON  A SINGLE  PARAMETER  ON  GROUND  POSITION 


AMOUNT  OF  INDIVIDUAL 
PARAMETER  PERTURBATIONS 

PRESENT* 
ACCURACY  (la) 

ft'MS  (m) 

PAR  ‘ " — 

0.80 

8.00 

80.0 

800.0 

MSS 

TM 

TIME  PARAMETERS 

Tf  (m  sec) 

.120 

1.20 

12.0 

120. 

120.0 
(80.0  m) 

ATc  (m  sec) 

.001 

.010 

.100 

1.00 

.400 
(320.  m) 

ATs  (m  sec) 

.205 

2.05 

205. 

205.' 

.003  • 

(neg.) 

ORBIT  PARAMETERS 

ft  (deg  x 10'3) 

.00716 

.0716 

.716 

7.16 

I (deg  x 10'3) 

.562 

5.62 

56.2 

562. 

45.0" 
(64.0  m) 

45.0 
(64.0  m) 

W (deg  x 10‘3) 

3.04 

30.4 

304. 

3040. 

As  (m) 

.195 

1.95 

19.5 

195. 

es  (x  10“6) 

1.65 

16.5 

165. 

1650. 

* PRESENT  RMS  MEASUREMENT  ACCURACY  OF  EACH  PARAMETER  AS  REPORTED  IN 
LITERATURE 
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TABLE  I EFFECT  OF  PERTURBATION  ON  A SINGLE  PARAMETER 
ON  GROUND  POSITION 

(continued) 

+ FOR  SATELLITE  POSITION  PERTURBATION  PARAMETERS 


AMOUNT  OF  INDIVIDUAL 
PARAMETER  PERTURBATIONS 

PRESENT 
ACCURACY  (la) 

(hi) 

PAR 

0,80 

8.00 

80.0 

800. 

Gq  (m) 

.900 

9.00 

90.0 

900 ! 

' sotn 

(444.  m) 

G^  (m/sec) 

.100 

1.00 

10.0 

100. 

O 

G2  (m/sec  ) 

.0085 

.085 

.850 

8.50 

PQ  (m) 

.900 

9.00 

90.0 

900. 

rofr; 

(89.0  m) 

P-|  (m/sec) 

.100 

1.00 

10.0 

100. 

P2  (m/sec2) 

.0085 

^085 

.850 

8.50 

R0  (m) 

12.5 

125. 

1250. 

12500. 

35. 

(2.24  m) 

R^  (m/sec) 

1.40 

14.0 

140. 

1400. 

2 

R2  (m/sec  ) 

.115 

l;15 

11.5 

115. 

(continued  next  page) 
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TABLE  I EFFECT  OF  PERTURBATION  ON  A SINGLE  PARAMETER 
ON  GROUND  POSITION 

(continued) 

+ FOR  SENSOR  ATTITUDE  PARAMETERS 


AMOUNT  OF  INDIVIDUAL 
PARAMETER  PERTURBATION 

PRESENT 
ACCURACY  (la) 

0.80 

8.00 

80.0 

800. 

MSS 

TM 

(deg  x 10'3) 

.0504 

.504 

5.04 

50.4 

TOUT 

(1590.  m) 

10. 0 

(159.  m) 

u-j  (deg/sec  x 10"3) 

.00555 

.055 

.555 

5.55 

'10.0 
(1440.  m) 

.001 

(.144  m) 

w?  (deg/sec2  x 10“6) 

.458 

4.58 

45.8 

458. 

0)3  (deg/sec3  x 10~6) 

.0355 

.355 

3.55 

35.5 

4>0  (deg  x 10"3) 

.0504 

.504 

5.04 

50.4 

too-; 

(1590.  m) 

10.0 

(159.  m) 

4>1  (deg/sec  x 10”3) 

.00561 

.0561 

.561 

5.61 

10.0 

(1430.  m) 

.001 

(.143  m) 

4>2  (deg/sec2  x 10”6) 

.458 

4.58 

45.8 

458. 

.0355 

.355 

3.55 

35.5 

ic0  (deg  x 10"3) 

.802 

8.02 

80.2 

802. 

100. 

(100.  m) 

0 0 
• • 
0 0 

3 

_3 

k-|  (deg/sec  x 10  ) 

.0859 

.859 

8.59 

85.9 

10.0 
(93.  m) 

Tool 

(.009  m) 

k2  (deg/sec2  x 10“6) 

7.16 

71.6 

716. 

7160. 

K3  (deg/sec3  x 10“6) 

.561 

5.61 

56.1 

561. 
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TABLE  II  EFFECT  OF  COMBINED  PERTURBATIONS 


IN  ALL  PARAMETERS  ON  GROUND  POSITION 


RMS  POSITION  CHANGE 

RMS  POSITION  CHANGE 

DUE  TO  INDIVIDUAL 

DUE  TO  COMBINED 

PERTURBATION  (m) 

PERTURBATIONS  (m) 

• 0.80 

5.28 

8.00 

52.9 

80.0 

536. 

800. 

6,370. 

TABLE  III  COMPARISON  OF  DIFFERENT  MATHEMATICAL  MODELS 


CASES 

MODEL/CASES 

I (RMS  M) 

R (RMS  M) 

CONTROL 

POINT 

CHECK 

POINT 

CONTROL 

POINT 

CHECK 

POINT 

(1) 

<1 

<1 

36 

50 

(2) 

<1 

<1 

38 

48 

(3) 

2 

2 

38 

48 

(4) 

36 

31 

45 

43 

(5) 

38 

38 

60 

57 

Case  I:  156  Control  Points 

156  Check  Points 

°control  “ ^ 

apixel  ” 0 

Case  R:  25  Control  Points 

156  Check  Points 

“control : “x  = “y  ’ “z  = 15  m 

“plan  = 21  m 
“total  = 26  m 

“pixel:  ox  - .29  pixel  (23  m) 

= .58  pixel  (34  m) 

“total  = 41  m 
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TABLE  IV  EFFECT  OF  DIFFERENT  CONTROL  DENSITIES 


Case  I:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

acontrol ' 0 

"pixel1  0 
156  Check  Points 

Case  R:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

"control : “r  ’ °2  = 15  1 

"total  " 26  m 

“pixel  : "x  = '29  pixel  <23  m) 

0^  = .58  pixel  (34  m) 

"total  = 41  m 
156  Check  Points 
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TABLE  V EFFECT  OF  DIFFERENT  CONTROL  POINT 
GROUND  POSITION  ACCURACY 


CONTROL  ACCURACY 

CASES 

I (RMS  M) 

R (RMS  M) 

“x  ■ “y  ” °z  <m> 

“total  <"> 

CONTROL 

POINT 

CHECK 

POINT 

CONTROL 

POINT 

CHECK 

POINT 

0 

0 

<1 

<1 

34 

47 

5 

9 

2 

1 

35 

48 

15 

26 

5 

4 

36 

50 

25 

43 

8 

7 

41 

52 

50 

87 

15 

13 

58 

62 

75 

130 

22 

19 

83 

80 

100 

173 

30 

26 

107 

98 

150 

260 

45 

39 

154 

135 

200 

346 

60 

51 

199 

172 

Case  I:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

156  Control  Points 

156  Check  Points 

a . . : 0 

pixel 

Case  R:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

25  Control  Points 
156  Check  Points 

“pixel1  “x  = '29  p1xel  (23  m) 

= .58  pixel  (34  m) 

“total  = 41  m 
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TABLE  VI  EFFECT  OF  DERIVED  IMAGE  POSITION  ACCURACY 


Image  Position  Accuracy 

CASES 

(PIXEL) 

I (RMS  M) 

R (RMS  M) 

(fctu 

CONTROL 

POINT 

CHECK 

POINT 

CONTROL 

POINT 

CHECK 

POINT 

0 

0 

<1 

<1 

15 

11 

.29  (23  m) 

.31  (18  m) 

30 

31 

30 

43 

.29  (23  m) 

.58  (34  m) 

40 

40 

36 

50 

.29  (23  m) 

.76  (44  m) 

48 

48 

41 

56 

.30  (24  m) 

1.04  (60  m) 

60 

61 

47 

70 

.31  (25  m) 

1.53  (89  m) 

84 

85 

64 

96 

.33  (26  m) 

2.02  (117  m) 

109 

109 

79 

122 

.35  (28  m) 

5.01  (291  m) 

261 

262 

179 

289 

Case  I:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

156  Control  Points 

156  Check  Points 

a . . : 0 

• control 

Case  R:  Model:  Ellipsoidal  Earth,  Elliptical  Orbit 

25  Control  Points 
156  Check  Points 

“control1  “x  = “y  = “z  * 15  ™ 

“total  ' 26  m 
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FIGURE  2 SATELLITE  ORBITAL  PLANE 
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FIGURE  4b 
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FIGURE  5 SCANNER  COORDINATE  SYSTEM 
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PROGRESS  IN  THE  SCENE-TO-MAP 
REGISTRATION  INVESTIGATION 


D.D.  DOW 

NASA/National  Space  Technology  Laboratories 
Earth  Resources  Laboratory 


ABSTRACT 


This  investigation  focuses  on  the  geometric  accuracy  of  the  scene- 
to-map  registration  process  for  P-format  Landsat  MSS  data  for  scenes 
from  Kansas  and  Louisiana/Mississippi.  Large  scale  row  and  column 
bias  values  and  row  and  column  standard  deviation  values  were  measured 
for  the  P-format  data  sets  indicating  a poor  georegistration  accuracy 
for  these  geometrically  corrected  Landsat  MSS  scenes.  Experimental 
work  is  underway  with  A-format  Landsat  MSS  scenes  from  the  same  loca- 
tions to  examine  the  influence  of  the  number  of  ground  control  points 
and  the  spatial  distribution  of  ground  control  points  on  geometric 
registration  accuracy.  An  early  conclusion  from  this  work  is  that  the 
root  mean  square  approach  for  assessing  how  well  the  ground  control 
points  fit  the  mapping  equations  measures  a different  aspect  of  geo- 
registration accuracy  than  does  the  approach  of  evaluating  the  bias 
(offset)  and  standard  deviation  using  independently  chosen  ground 
reference  points. 


INTRODUCTION 


The  scene-to-map  registration  process  is  a crucial  step  in  the 
preprocessing  of  Landsat  Multi  spectral  Scanner  (MSS)  and  Thematic 
Mapper  (TM)  data.  Georeferenced  Landsat  MSS  products  approach  the 
national  map  accuracy  standards  for  the  1:250,000  scale  (USGS,  1979a). 
This  has  resulted  in  the  utilization  of  the  Landsat  data  to  develop 
map  products,  to  serve  as  a component  of  a multi  source  data  base,  and 
in  change  detection  of  land  cover  categories  through  a comparison  of 
post-classification  products  developed  at  two  different  points  in 
time.  The  registration  and  rectification  of  Landsat  data  is  accom- 
panied by  geometric  offsets  resulting  from  the  remapping  techniques 
employed  and  radiometric  distortions  resulting  from  the  resampling 
functions  used.  This  study  focuses  on  the  factors  influencing 
geometric  fidelity.  The  factors  to  be  examined  include  the  spatial 
distribution  of  the  ground  control  points  utilized  and  the  number  of 
ground  control  points  employed.  The  influence  of  resampling  functions 
on  geometric  errors  should  be  less  than  half  a pixel  and  would  only 
become  an  important  factor  for  georeferencing  Landsat  products  at  a 
sub-pixel  level  of  accuracy. 

Landsat  computer  compatible  tapes  (CCT)  are  available  in  the 
A-format  which  has  been  radiometrically  corrected  and  in  the  P-format 
which  includes  radiometric  and  geometric  corrections.  The  A-format 
Landsat  MSS  data  is  processed  through  the  Master  Data  Processor  (MDP) 
at  Goddard  Space  Flight  Center  to  remove  the  gap  problem  inherent  in 
MSS  data,  without  resampling  the  data.  The  P-format  Landsat  MSS  data 
comes  in  a geometrically  converted  form  which  in  the  standard  product 


employs  a Hotine  Oblique  Mercator  (HOM)  projection  as  a map  base  and 
cubic  convolution  resampling.  The  users  of  Landsat  data  for 
geographic  information  systems  face  the  problem  that  the  base  map 
projection  for  their  work  most  often  utilizes  the  Universal  Transverse 
Mercator  (UTM)  system,  while  the  base  for  A-  and  P-format  MSS  data  is 
the  HOM  system  and  for  the  Landsat  4 TM  data  is  the  Space  Oblique 
Mercator  (SOM)  system.  The  EROS  Data  Center  Digital  Image  Processing 
System  (EDIPS)  in  Sioux  Falls,  South  Dakota  has  developed  software  to 
convert  from  one  of  the  above  map  projection  systems  to  another.  The 
UTM  system  imparts  a scale  distortion  of  1 part  in  1,000  (1:1,000) 
compared  to  the  1:10,000  distortion  associated  with  the  SOM  and  HOM 
projections  (USGS,  1980,  a,  b,  and  c.). 

The  number  of  ground  control  points  (GCPs)  used  to  geometrically 
correct  P-format  Landsat  MSS  tapes  is  listed  in  the  CCT  header  record 
as  the  quality  assessment  number.  The  quality  assessment  number  is 
the  truncated  integer  of  the  expression  (N  + 7)/8,  where  N is  the 
number  of  control  points  used.  If  no  GCPs  were  utilized,  then  the 
P-format  CCT  is  referred  to  as  system  corrected.  Currently,  all  of 
the  Landsat  4 TM  tapes  are  system  corrected  to  produce  P-format 
products.  For  Landsat  MSS  products  that  have  been  system  corrected, 
the  georegistration  accuracy  will  be  within  60  pixels  99  percent  of 
the  time.  When  25  to  50  GCPs  are  used  in  a Landsat  scene,  the 
georegistration  accuracy  will  be  within  1 pixel  more  than  99  percent 
of  the  time.  The  georegistration  accuracy  is  10  pixels  for  8 to  24 
GCPs  and  20  pixels  for  1 to  7 GCPs  (Nelson  and  Grebowsky,  1982).  A 
recent  study  by  Graham  and  Luebbe  (1981)  showed  that  the  quality 


assessment  number  is  not  necessarily  a good  Indicator  of  registration 
accuracy. 

Investigations  of  scene-to-map  registration  accuracy  can  be 
divided  into  theoretical  and  empirical  studies.  Some  examples  of  each 
type  of  investigation  will  be  discussed  in  the  following  sections, 
beginning  with  the  theoretical  approach.  Sawada  et  al . (1981) 
developed  an  analytical  model  utilizing  satellite  orbit/attitude 
information  from  the  Scene  Image  Annotation  Tape  (SIAT)  plus  data  on 
characteristics  of  the  MSS  scanning  mechanism  to  correct  geometrical 
distortions  to  within  one  pixel  accuracy  utilizing  3 GCPs  to  estimate 
nonlinear  scan  mirror  corrections  and  20  GCPs  for  error  estimation.  A 
second  approach  is  to  fit  MSS  images  to  ground  control  by  means  of 
different  mathematical  models  and  to  analyze  the  residuals  for  each 
mathematical  model  as  a means  of  determining  which  model  will  produce 
the  greatest  geometric  accuracy  given  a specified  configuration  of 
GCPs  (Wong,  1975;  Steiner  and  Kirby,  1977;  Dowman  and  Mohamed,  1981). 
Wong  (1975)  achieved  the  best  results  with  a 20  term  polynomial 
employing  25  to  30  ground  control  points  with  a reported  limiting 
geometric  accuracy  of  +55  meters.  Dowman  and  Mohamed  (1981)  achieved 
a root  mean  square  (rms)  error  of  83  meters  using  no  GCPs,  while  the 
rms  error  was  approximately  60  meters  when  20  GCPs  were  used. 

The  empirical  approach  to  the  scene-to-map  registration  accuracy 
assessment  involves  selecting  a second  set  of  Independently  chosen 
ground  reference  points  (GRPs)  and  comparing  their  location  on  the  map 
with  that  in  the  georeferenced  Landsat  MSS  product.  A system 
corrected  P-format  product  accuracy  assessment  reported  standard  error 
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in  both  directions  of  160  meters  which  was  reduced  to  50  meters  after 
the  application  of  a linear  least-square  analysis  correction  procedure 
(USGS,  1979a).  A second  study  of  P-format  data  which  employed  GCPs 
from  a 1:24,000  scale  topographic  map  reported  rms  errors  of  218 
meters  in  the  east-west  direction  and  880  meters  in  a north-south 
direction  (Colwell  et  al.,  1980).  The  first  12  lines  of  Table  1 
presents  the  results  of  a recent  investigation  that  examined  12 
different  Landsat  MSS  scenes  in  the  P-format  and  compared  the  location 
accuracy  of  the  tick  marks  in  the  Landsat  scene  by  using  independently 
chosen  GRPs  (Graham  and  Luebbe,  1981).  The  row  offset  (bias)  over  12 
Landsat  scenes  varied  from  -414.8  to  15.8,  while  the  column  offset 
(bias)  varied  from  -0.8  to  9.5.  In  this  case  the  results  are  given  in 
multiples  of  the  size  of  one  georegistered  pixel  (57  meters).  All  of 
these  studies  suggest  a need  for  a systematic  investigation  of  the 
problems  with  P-format  MSS  data  that  causes  distortions  in  the 
scene-to-map  registration  process. 

METHODS 

The  Landsat  MSS  frames  to  be  used  in  this  study  were  acquired  over 
southeastern  Louisiana  and  coastal  Mississippi  (path:  23;  row:  39  of 
the  the  worldwide  reference  system)  and  over  eastern  Kansas  and 
western  Missouri  (path:  29;  row:  33).  The  Kansas  data  was  collected 
on  11/9/81  and  had  a quality  assessment  number  of  2,  while  the 
Louisiana  data  was  gathered  on  11/21/81  and  had  a quality  assessment 
number  of  3.  Both  Landsat  MSS  scenes  had  10  percent  cloud  cover.  The 
Louisiana  Landsat  scene  includes  open  water  (Lake  Pontchartrain)  areas 
and  wetlands  adjacent  to  the  metropolitan  New  Orleans  area  in  which  it 


is  difficult  to  choose  GCPs  and  GRPs.  The  Kansas  Landsat  scene  was 
more  amendable  to  choosing  evenly  spaced  GCPs  and  GRPs. 

The  points  to  be  utilized  for  GCPs  and  GRPs  were  chosen  on 
1:24,000  scale,  7.5  minute  quadrangle  sheets  produced  by  the  U.S. 
Geological  Survey  (USGS).  Where  possible,  three  ground  control  or 
reference  points  were  located  on  each  7.5  minute  quadrangle  sheet  and 
the  same  points  were  identified  on  the  Landsat  scene  of  A-format  MSS 
tapes.  The  ground  points  map  coordinates  were  recorded  in  the  UTM 
system  as  northings  and  eastings,  while  the  Landsat  coordinates  were 
recorded  as  rows  and  elements.  For  the  Louisiana  P-format  Landsat  MSS 
scene  192  ground  points  were  chosen,  while  359  ground  points  were  used 
for  the  A-format  data.  For  the  Kansas  P-format  Landsat  MSS  scene  145 
ground  points  were  chosen  and  356  ground  points  were  picked  for  the 
A-format  CCT.  More  points  were  utilized  for  the  A-format  data,  since 
the  points  had  to  be  used  for  GCPs  to  carry  out  the  georegistration 
procedure  and  GRPs  to  independently  check  the  accuracy  of  the 
georegistration  procedure.  The  types  of  features  used  as  ground 

points  included  manmade  (road  intersections)  and  natural  (river 

intersections)  categories.  Steiner  and  Kirby  (1977)  discuss  the 
accuracy  with  which  ground  points  can  be  chosen  both  on  maps  and  in 
Landsat  scenes.  Since  there  is  excellent  registration  between  bands 
in  the  MSS  (Col vocoresses  and  McEwen,  1973),  it  is  not  necessary  to 
make  corrections  in  ground  point  locations  on  the  Landsat  scene  when 

different  MSS  bands  have  been  utilized  in  detecting  the  ground 

features. 

The  approach  used  to  measure  the  accuracy  of  the  GCPs  as  a set  was 
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to  compare  them  to  a linear  polynomial  model  of  the  form: 

(1)  SL  = A]  + A2  E + A3N  + e 

(2)  CE  = B1  + B2  E + B3N  + e 

where  "SL"  represents  the  scan  line  coordinate,  "CE"  represent  the 
corrected  elements,  "E"  represents  the  UTM  easting,  "N"  represents  the 
UTM  northing,  "A-i"  to  "A3"  and  "Bi"  to  "B3"  are  constants,  and  "e" 
represents  the  residual  error.  The  root  mean  square  (rms)  determina- 
tion quantifies  how  far  the  measured  GCP  coordinates  differ  from  the 
GCP  coordinates  computed  from  the  linear  polynomial  model.  That  is: 

(3)  RMS  = V £ SL  measured  - (A-,  + A2  E + A3N)]  2/dF 

(4)  RMS  = yj  £ CE  measured  - (B^  + B2  E + B3N 2/dF 

where  the  terms  are  defined  as  before  and  dF  equal  the  degrees  of 
freedom. 

When  the  residual  error  was  large  for  a given  GCP,  this  suggested 
the  possibility  that  the  ground  point  coordinates  may  have  been 
misread  from  either  the  map  or  the  Landsat  image.  A check  was  made  of 
the  coordinates  and  corrections  were  made  where  necessary.  If  the 
point  coordinates  appeared  to  be  accurate  and  the  point  had  a large 
residual  error,  the  point  was  kept.  The  rms  value  is  a measure  of  how 
well  the  set  of  GCPs  employed  fit  the  mapping  equations  {linear 
polynomial  model). 

To  evaluate  the  georegistration  accuracy  of  P-format  Landsat  MSS 
data,  an  independently  chosen  set  of  ground  reference  points  (GRP)  was 
selected.  The  procedure  of  Graham  and  Luebbe  (1981)  was  used  to 
quantify  the  georegistration  accuracy  in  terms  of  RBIAS  (row  offset), 
CBIAS  (column  offset),  RSD  (row  standard  deviation)  and  CSD  (column 


standard  deviations).  High  georegistration  accuracy  would  be 
characterized  by  sub-pixel  bias  and  standard  deviation  values.  The 
equations  for  computing  bias  and  standard  deviation  are: 

NP 

T ( R0W1 • - R0W2 . ) 

(5)  RBI AS  = i = 1 1 1 

NP 


VNP  2 

T ( R0W1 . - R0W2.  - RBI AS) 

i=l  1 1 

__ 

where  NP  is  the  number  of  GRPs  chosen,  R0W1  is  the  Landsat  row  deter- 
mined using  the  EROS  software,  and  R0W2  is  the  Landsat  row  read  from 
the  Landsat  imagery.  For  the  A-format  Landsat  MSS  tapes,  R0W1  is  the 
Landsat  row  determined  using  the  mapping  equations  which  are  computed 
from  the  GCPs.  The  ELAS  module  TRAN  which  contains  the  EROS  sub- 
routine PIXGEO  converts  UTM  coordinates  to  Landsat  row  and  column 
(elements)  coordinates.  The  error  introduced  by  the  module  TRAN  is 
less  then  +_  1/2  Landsat  pixel  (Graham  and  Luebbe,  1981).  The  opera- 
tion of  the  module  TRAN  was  checked  by  comparing  the  apparent  and 
actual  location  of  the  tick  marks  on  the  P-format  Landsat  MSS  tape. 

One  of  the  objectives  of  this  study  is  to  determine  how  the 
spatial  distribution  of  GCPs  influences  the  resulting  accuracy  of  the 
georegistration  process.  To  characterize  the  spatial  distribution  of 
points,  the  approach  of  measuring  the  distance  from  a point  to  its 
nearest  neighbor,  irrespective  of  direction  was  employed  (Clark  and 
Evans,  1954).  The  module  CSPA  was  developed  to  compute  the  parameter 
"R"  which  compares  the  mean  observed  nearest  neighbor  distance  to  the 


mean  nearest  neighbor  distance  if  the  population  was  distributed  at 
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random.  The  "R"  values  can  range  from  0 (maximum  aggregation  or 
clumping  of  points)  to  2.15  (maximum  spacing  or  a regular  distribution 
of  points).  In  this  analysis  "R"  values  between  0.7  and  1.3  were 
taken  to  indicate  a random  distribution  of  points,  values  below  0.7 
indicated  a clustered  distribution  and  a value  above  1.3  indicated  a 
regular  distribution  of  points.  Figure  1 shows  the  spatial 
distribution  of  GCPs. 

RESULTS  AND  DISCUSSION 

The  georegistration  accuracy  assessment  of  the  P-format  Landsat 
MSS  tapes  is  given  on  the  last  two  lines  in  Table  1.  Both  the  Kansas 
and  the  Louisiana/  Mississippi  P-format  MSS  data  show  high  RBIAS  and 
RSD  values  and  fairly  high  CBIAS  and  CSD  values.  The  other  values  in 
Table  1 are  the  results  of  Graham  and  Luebbe  (1981)  using  the  same 
accuracy  assessment  methodology.  Data  sets  5 and  6 of  Graham  and 
Luebbe  (1981)  which  had  high  RBIAS  values,  attributed  the  error  to 
inaccuracies  in  the  tick  mark  registration  information  on  the  CCT. 
The  fact  that  for  the  1981  data  for  Kansas  and  Louisiana/Mississippi 
had  both  high  BIAS  and  SD  values,  suggests  that  some  other  factor  is 
responsible  for  the  very  poor  georegistration  accuracy.  A visual 
examination  of  the  Kansas  P-format  data  for  1981  revealed  that  the 
section  boundaries  which  should  have  been  squared  on  the  Landsat  image 
were  instead  rectangular  and  that  roads  that  ran  north-south  on  the 
map  run  northwest-southeast  on  the  Landsat  image.  This  information 
suggests  that  the  1981  P-format  data  for  Kansas  and 
Louisiana/Mississippi  is  distorted  in  other  ways  besides  a simple 
north-south  translation. 
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The  P-format  ground  points  for  the  1981  data  for  Kansas  and 
Louisiana/Mississippi  were  divided  into  8 to  32  randomly  chosen  GCPs 
with  the  rest  of  the  ground  points  used  as  GRPs.  The  GCPs  were  run 
through  the  ELAS  georegistration  module  BMGC  and  the  BIAS  and  SD  were 
computed  as  explained  in  equations  (5)  and  (6).  The  results  of  this 
analysis  is  given  in  Table  2 where  it  can  be  seen  that  the  ELAS 
georegistration  procedures  (Graham  et  al.,  1980)  operated  on  P-format 
data  gave  sub-pixel  geometric  accuracy.  Since  this  procedure  involves 
resampling  the  data  twice,  it  presumably  Introduces  radiometric 
distortions  into  the  data.  The  RBIAS  results  for  Louisiana  and  the 
CBIAS  results  for  Kansas  suggest  a trend  of  decreasing  BIAS  values 
through  the  use  of  increasing  numbers  of  GCPs.  No  firm  conclusions 
can  be  drawn  in  this  regard,  since  the  study  was  done  without  any 
replicates.  Table  3 presents  a similar  type  of  study  using  A-format 
MSS  data  without  any  replicates.  For  a given  number  of  GCPs  the 
A-format  data  appears  to  have  lower  BIAS  and  SD  values  than  does  the 
P-format  data.  The  important  conclusion  is  that  both  the  A-format  and 
P-format  data  provide  sub-pixel  georegistration  accuracy  when  as  few 
as  8 GCPs  are  used  on  a whole  Landsat  scene.  This  study  chose  GCPs  in 
groups  of  eight,  so  that  when  16  GCPs  were  used  in  one  run  and  24  GCPs 
were  utilized  in  the  next  run,  the  two  sets  of  data  shared  16, 
randomly  chosen  GCPs  in  common.  This  procedure  was  followed  to  reduce 
the  variation  in  the  different  data  sets. 

The  next  phase  of  the  study  was  to  examine  the  influence  of  the 
spatial  distribution  of  GCPs  on  the  accuracy  with  which  it  is  possible 
to  georegister  A-format  MSS  data  utilizing  the  ELAS  scene-to-map 
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registration  software  (Graham  et  al.,  1980).  The  results  of  the 
initial  phase  of  this  investigation  is  presented  in  Table  4.  This 
analysis  involved  20 % of  a Landsat  scene  which  utilized  8 GCPs  to 
develop  the  mapping  equations  and  the  rest  of  the  ground  points  to  act 
as  GRPs  in  order  to  quantify  the  georegistration  accuracy.  None  of 
the  numbers  in  Table  4 are  statistically  different  at  the  P = 0.10 
level  of  significance  for  the  5 replicates  measured  for  the  Kansas  and 
Louisiana  data.  The  general  trend  is  for  the  BIAS  values  for  rows  and 
columns  to  increase  in  magnitude  as  one  goes  from  a random  to  a 
regular  to  a clustered  distribution.  There  is  no  clear  general  trend 
apparent  for  SD  results.  The  CSPA  module  with  its  numerical  criteria 
was  used  as  described  in  the  methods  to  distinguish  whether  the 
distribution  of  8 GCPs  followed  a random,  regular  or  clustered 
pattern. 

The  next  phase  of  the  study  utilizing  20%  of  a Landsat  scene 
examined  the  question  of  the  relative  importance  of  the  number  of  GCPs 
versus  the  spatial  distribution  of  the  GCPs.  Since  the  number  of 
ground  points  in  20%  of  a Landsat  scene  varied  from  28  to  40,  it  was 
decided  to  combine  the  Kansas  and  Louisiana  data  sets  for  this 
analysis.  The  results  are  presented  in  Table  5.  The  general  trend  is 
for  the  clustered  distribution  of  points  to  have  greater  geometric 
inaccuracy  (both  BIAS  and  SD)  than  the  random  distribution  of  points, 
both  for  the  case  of  8 (statistically  significant  CBIAS  results)  and 
16  GCPs.  In  going  from  8 GCPs  with  a random  distribution  to  16  GCPs, 
the  random  distribution  exhibits  greater  georegistration  accuracy  for 
both  BIAS  and  SD  than  does  the  clustered  spatial  distribution  of 
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points.  This  is  another  area  in  need  of  additional  work,  but  the 
preliminary  analysis  suggests  that  georegistration  accuracy  is  more 
sensitive  to  the  number  of  GCPs  used  than  it  is  to  the  spatial 
distribution  of  GCPs. 

A final  question  of  interest  is  the  relationship  between  the  RMS 
method  of  assessing  georegistration  accuracy  and  the  method  of  Graham 
and  Luebbe  (1981)  that  uses  an  independent  set  of  GRPs  to  compute  BIAS 
and  SD  values.  Table  6 presents  a correlation  analysis  to  answer  this 
question.  The  "N"  is  the  number  of  observations,  the  "M"  is  the  slope 
and  the  "b"  is  the  intercept  of  the  regression  equation,  and  "r" 
represents  the  correlation  coefficient  which  varies  between  1 and  -1. 
The  fact  that  the  correlations  are  not  statistically  different  at  the 
5 percent  level  of  significance  suggests  that  the  RMS  value  and  BIAS 
and  SD  measurements  are  quantifying  different  concepts.  One  would 
expect  this  result  from  theory,  but  many  Landsat  practitioners  falsely 
utilize  the  RMS  value  as  a measurement  of  how  accurate  the  scene-to- 
map  registration  process  is.  The  georegistration  accuracy  needs  to  be 
measured  independently  and  the  procedure  of  Graham  and  Luebbe  (1981) 
is  one  approach. 
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A.  REGULAR  DISTRIBUTION  OF  POINTS 
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B.  RANDOM  DISTRIBUTION  OF  POINTS 


C.  CLUSTERED  DISTRIBUTION  OF  POINTS 


Figure  1.  MAJOR  CATEGORIES  OF  GROUND  CONTROL  POINT  DISTRIBUTION 
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Table  1.  P- 

-Format  Georegistration  Accuracy  Assessment. 

DATA  SET 

LANDSAT 
MISSION  NO 

ASSESSMENT 

NUMBER 

DATE  GEN 
BY  MDP 

RBIAS 

RSD 

CBIAS 

CSD 

1 

2 

5 

7/23/79 

0.5 

1.1 

-0.3 

1.3 

2 

2 

4 

7/29/79 

0.9 

2.4 

0.1 

1.1 

3 

2 

1 

8/30/79 

0.2 

1.3 

-0.2 

1.0 

4 

3 

2 

4/23/80 

15.8 

3.9 

0.6 

1.7 

5 

2 

1 

5/18/80 

-414.8 

5.3 

9.2 

0.9 

6 (KS) 

2 

0 

5/18/80 

-407.4 

4.2 

9.5 

1.0 

7 

2 

3 

5/12/79 

0.7 

1.1 

1.4 

1.0 

8 

3 

2 

6/04/79 

1.3 

1.1 

-0.8 

1.2 

9 

3 

2 

9/15/80 

0.3 

1.1 

-0.8 

1.2 

10 

3 

3 

2/15/80 

-3.6 

1.7 

3.2 

1.6 

11 

2 

2 

8/05/79 

2.1 

1.5 

0.2 

2.5 

12 

2 

4 

5/28/80 

10.5 

2.3 

9.0 

1.3 

LA/MS 

2 

3 

11/21/81 

-219.4 

220.8 

-95.6 

48.8 

KS 

2 

2 

11/09/81 

251.8 

226.7 

100.3 

40.9 

on 
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cn 

o 

ro 


Table  2.  P-Format  Study  of  Whole  Landsat  Scene 
GCPs 


Location  - Used 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

GRP 

La.  - 8 

90 

-0.57 

0.06 

0.49 

0.11 

184 

La.  - 16 

83 

-0.36 

0.06 

0.63 

0.11 

176 

La.  - 24 

86 

-0.26 

0.06 

0.60 

0.11 

168 

La.  - 32 

97 

-0.09 

0.07 

0.44 

0.11 

160 

KS  - 8 

66 

0.10 

0.06 

-0.79 

0.12 

145 

KS  - 16 

92 

0.04 

0.06 

-0.33 

0.13 

137 

KS  - 24 

101 

0.11 

0.07 

-0.18 

0.13 

129 

KS  - 32 

96 

0.09 

0.07 

-0.15 

0.14 

121 
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Table  3. 

A-Format 

Study  of  Whole  Landsat  Scene 

Location 

GCPs 
- Used 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

GRP 

La.  - 

8 

65 

-0.13 

0.06 

0.01 

0.05 

351 

La.  - 

16 

73 

0.06 

0.05 

-0.09 

0.05 

343 

La.  - 

24 

76 

0.21 

0.05 

-0.16 

0.05 

335 

La.  - 

32 

71 

0.17 

0.06 

-0.14 

0.05 

327 

KS  - 

8 

45 

-0.02 

0.05 

-0.20 

0.05 

348 

KS  - 

16 

41 

0.01 

0.05 

-0.09 

0.05 

340 

KS  - 

24 

46 

0.06 

0.05 

-0.10 

0.05 

332 

KS  - 

32 

51 

0.03 

0.05 

-0.07 

0.05 

324 

cn 

O 

co 


Table  4.  Influence 
Accuracy 

of  Spatial 

Distribution 

of  Ground  Control 

Points  on 

Georegi strati 

Location 

and 

Type 

RMS1 

RBIAS2 

RSD2 

CBIAS2 

CSD2 

KS  - Random 

37.8 

0.29 

0.43 

0.35 

0.29 

KS  - Regular 

49.6 

0.32 

0.39 

0.36 

0.25 

KS  - Clustered 

46.4 

0.88 

0.49 

0.49 

0.49 

LA  - Random 

48.0 

0.15 

0.24 

0.29 

0.20 

LA  - Regular 

38.6 

0.64 

0.27 

0.75 

0.36 

LA  - Clustered 

46.8 

0.80 

0.34 

0.83 

0.30 

^meters 

^pixels 

NOTE:  Based  on  20%  of  a Landsat  Scene  of  A-Format  data,  8 GCPs,  and  5 Repl i cates;  none  of 

the  above  numbers  are  statistically  different  at  the  10%  level  of  significance. 
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Table  5.  Influence  of  the  Spatial 
Georegistration  Accuracy 

Distribution 

and  Number 

of  Ground  Control 

Points 

Number 

and 

Type 

RMS1 

RBIAS2 

2 

RSd 

CBIAS2 

2 

CSD4 

8 - Random 

52.6 

0.39 

0.24 

0.17* 

0.27 

8 - Clustered 

44.7 

0.85 

0.41 

0.78* 

0.42 

16  - Random 

55.0 

0.28 

0.27 

0.20 

0.25 

16  - Clustered 

49.0 

0.38 

0.32 

0.39 

0.37 

^meters 

2 • ! 
pixels 

* Statistically  different  at  10%  level  of  significance 

NOTE:  Based  on  20%  of  a Landsat  Scene  of  A-Format  Data  and  7 Replicates  (Louisiana  and 
Kansas). 
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Table  6.  Correlation  Analysis:  RMS  vs.  Absolute  Value  of  BIAS  and  SD 

Statistical 


Parameter 

N 

M 

b 

_r 

Significance 

RBIAS 

19 

0.0047 

0.0572 

0.434 

N.S. 

RSD 

19 

-0.0020 

0.3695 

-0.288 

N.S. 

CBIAS 

19 

0.0041 

-0.0968 

0.406 

N.S. 

CSD 

19 

-0.0016 

0.3231 

-0.224 

N.S. 

NOTE: 

Based  on 

a random  distribution 

of  points  and  8 

GCPs. 
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RELATING  SPATIAL  PATTERNS  IN  IMAGE  DATA 
TO  SCENE  CHARACTERISTICS 


Alan  H.  Strahler 
Curtis  E.  Woodcock 

Department  of  Geology  and  Geography 
Hunter  College 


ABSTRACT 


In  remote  sensing,  the  primary  goal  is  accurate  scene  inference, 
in  which  characteristics  of  the  scene  are  inferred  from  the  image  data. 
More  effective  inference  of  scene  characteristics  can  be  accomplished 
through  the  use  of  techniques  that  use  explicit  models  of  spatial 
pattern.  Spatial  patterns  in  image  data  are  functionally  related  to 
the  size  and  spacing  of  elements  in  the  scene  and  to  the  spatial 
resolution  of  the  image  data.  At  resolutions  where  variance  is  high, 
scene  inference  techniques  should  rely  heavily  on  data  from  the  spatial 
domain.  As  variance  decreases,  effective  scene  inference  will  increas- 
ingly rely  on  spectral  data. 
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INTRODUCTION 

Central  to  the  field  of  remote  sensing  is  the  problem  of  scene 
inference,  in  which  the  characteristics  of  the  scene  are  inferred  from 
the  image  data.  Past  attempts  at  scene  inference  have  been  dominated 
by  spectral  pattern  recognition.  However,  remotely  sensed  measurements 
are  typically  arrayed  in  a systematic  fashion  corresponding  to  the 
areas  on  the  ground  from  which  the  measurements  were  made.  Thus, 
spatial  data  are  also  available  for  use  in  scene  inference. 

This  paper  presents  the  results  of  the  analysis  of  spatial  patterns 
in  image  data  by  two  methods  for  three  environments.  The  results 
enhance  our  understanding  of  the  relationship  between  spatial  pattern 
in  image  data  and  the  characteristics  of  the  ground  scene.  However, 
these  results  should  be  viewed  as  intermediate  in  nature,  because  they 
are  only  one  step  in  the  larger  process  of  developing  improved  methods 
of  using  spatial  data  in  scene  inference.  To  understand  the  role 
spatial  data  plays  in  scene  inference,  a conceptual  model  of  the  remote 
problem  is  necessary. 

This  paper  serves  as  the  final  report  for  the  first  year  of  NASA 
Contract  9-16664,  Subcontract  L200080,  which  is  part  of  the  NASA 
Fundamental  Research  [Vogram  on  Mathematical  Pattern  Recognition  and 
Image  Analysis.  In  addition,  this  paper  was  presented  at  the  17th 
International  Symposium  on  Remote  Sensing  of  the  Environment  in  Ann 
Arbor,  Michigan  in  May  1983. 


A Remote  Sensing  Model 


A remote  sensor  can  be  defined  as  a device  which  measures  the 
intensity  of  electromagnetic  radiation.  Associated  with  a sensor  is  a 
resolution  cell  (or  pixel),  defined  as  the  size  and  shape  of  the  areas 
in  the  field  of  view  over  which  the  electromagnetic  signal  strength 
is  integrated.  The  response  time  of  the  sensor  is  the  time  over  which 
the  received  signal  is  integrated.  Also  associated  with  a sensor  is  a 
response  function  describing  the  integration  over  wavelengths  in  the 
electromagnetic  spectrum,  and  a point  spread  function  defining  the 
integration  over  the  field  of  view  of  the  sensor.  A measurement  is  the 
output  of  a sensor  response  to  the  above  integrations.  A scene  is 
defined  as  the  spatial  and  temporal  distribution  of  matter  and  energy 
fluxes  from  which  the  sensor  can  draw  measurements.  An  image  is  a 
collection  of  measurements  from  a sensor  that  are  arrayed  in  a systematic 
fashion.  In  the  context  of  this  paper,  spatial  patterns  refer  to  the 
spatial  arrangement  of  measurements  in  an  image. 

The  measurements  produced  by  a sensor  can  be  seen  as  a function  of 
the  spatial  and  temporal  distribution  of  energy  and  matter  in  the  scene, 
the  characteristics  of  the  sensor,  and  the  scattering  and  absorption 
that  occurs  in  the  atmosphere  between  the  scene  and  the  sensor.  A remote 
sensing  model,  then,  consists  of  three  components:  a scene  model  that 

specifies  the  form  and  nature  of  the  energy  and  matter  within  the  scene 
and  their  spatial  and  temporal  order;  an  atmospheric  model  that  describes 
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the  interaction  between  the  atmosphere  and  the  energy  emitted  by  the 
scene;  and  a sensor  model  that  describes  the  behavior  of  the  sensor 
in  responding  to  the  energy  fluxes  incident  upon  it  and  in  producing 
the  measurements  that  constitute  the  image. 

In  general,  the  remote  sensing  problem  can  be  presented  as  inferring 
the  order  in  the  properties  and  distributions  of  matter  and  energy  in 
the  scene  from  the  set  of  measurements  comprising  the  image.  Whether 
explicit  or  not,  scene  inference  always  inplies  the  application  of  a 
remote  sensing  model,  in  that  assumptions  must  always  be  made  concerning 
the  ground  scene,  atmosphere,  and  sensor.  The  problem  of  scene  inference, 
then,  becomes  a problem  of  model  inversion  in  which  the  order  in  the  scene 
is  reconstructed  from  the  image  and  remote  sensing  model. 

The  characterization  of  spatial  patterns  in  image  data  is  intended 
to  provide  an  improved  understanding  of  scene  models.  However,  an 
important  implication  of  this  work  concerns  the  relation  between  the 
size  of  the  elements  in  the  scene  and  the  size  of  the  resolution  cells 
in  the  image.  This  fundamental  property  of  the  sensor  system  has  important 
implications  in  the  characterization  of  spatial  pattern  in  image  data  and 
the  inversion  of  the  remote  sensing  model  for  scene  inference. 

Scene  Components 

In  specifying  scene  models,  it  is  necessary  to  define  the  entities 
or  objects  in  the  scene  that  are  to  be  considered.  These  entities  are 
actually  an  abstraction  of  a class  of  real  objects  in  the  scene,  and 


and  they  will  be  referred  to  as  el ements.  In  this  context,  elements 


are  regarded  as  having  uniform  properties  or  parameters.  These 
properties  may  be  fundamental  and  invariant,  or  they  may  be  stochastic 
in  nature  --  i.e.,  characterized  by  distributions.  The  elements  in  a 
scene  can  vary  widely  according  to  the  interests  of  the  interpreter. 

Several  examples  of  scene  elements  are;  leaf,  branch,  plant,  crop  row, 
tree,  field,  stand;  lawn,  house,  car,  street,  garden,  housing  development; 
airplane,  building,  runway,  truck,  airport.  In  addition  to  these  elements, 
which  are  essentially  discrete  entities,  a particular  type  of  element, 
the  background,  should' be  recognized.  The  background  is  usually  assumed 
to  be  spatially  continuous  with  uniform  properties  and  parameters  and  is 
typically  obscured  partially  by  other  elements  in  the  scene.  Soil,  rock, 
snow,  and  vegetative  understory  are  examples  of  background  elements. 

For  the  purpose  of  this  paper,  geographic  distributions  refer  to  the 
spatial  arrangements  of  elements  in  a scene. 

Current  Use  of  Spatial  Scene  Models  in  Scene  Inference 

In  all  attempts  at  scene  inference,  assumptions  must  be  made  about 
the  scene,  sensor,  and  atmospheric  models.  For  scene  models,  these 
assumptions  can  be  either  defaulted  to  nonspatial  forms,  or  include 
implicit  or  explicit  models  of  the  geographic  distribution  of  elements 
in  the  scene.  Most  remote  sensing  models  default  to  nonspatial  forms 
in  which  individual  measurements  are  processed  independently  of  their 
location  in  the  image  and  the  characteristics  of  their  neighbors. 


Conventional  supervised  and  unsupervised  techniques  both  default  to 
such  nonspatial  forms.  Another  group  of  remote  sensing  models  with 
nonspatial  scene  models  are  the  proportion  estimation,  or  mixture 
models.  Most  of  these  models  estimate  the  mixture  of  elements  within 
individual  pixels  {11,  5,  1},  but  the  CLASSY  algorithm  {10},  estimates 
proportions  of  unknown  elements  for  the  entire  image. 

Some  remote  sensing  models,  such  as  BLOB  {7},  ECHO  {8},  and  AMOEBA 
{2},  implicitly  assume  isotropic  high  spatial  autocorrelation  in  the 
scene  model.  In  these  approaches,  empirically  derived  constraints 
are  used  to  enhance  the  likelihood  that  adjacent  pixels  are  classified 
the  same.  These  approaches  are  most  effective  in  agricultural  areas, 
where  the  assumption  of  high  spatial  autocorrelation  is  valid.  However, 
to  date  there  has  been  no  attempt  to  determine  the  validity  of  this 
simple  spatial  model  for  other  environments  except  through  application 
of  the  model  and  evaluation  of  the  results. 

Haralick's  sloped  facet  model  {4},  explicitly  states  the  nature  of 
the  spatial  pattern  in  the  image  data.  This  model  allows  for  linear 
deviation  in  brightness  values  with  distance,  hense  the  sloped  nature 
of  the  facets.  Again,  there  has  been  no  attempt  to  determine  the  validity 
of  that  model  for  various  combinations  of  scene  elements  and  resolution 
cell  sizes.  Another  remote  sensing  model  with  an  explicit  spatial  model 
is  the  invertible  coniferous  forest  canopy  reflectance  model  of  Strahler 
and  Li  {13}.  The  model  requires  the  assumption  of  multiple  trees  per 
resolution  cell  for  inversion.  A Neyman  Type  A model  of  the  spatial 


distribution  of  trees  is  the  explicit  spatial  model  used  in  the 
inversion  process. 

One  group  of  remote  sensing  models  use  measures  of  image  texture 
as  the  basis  of  scene  inference.  Haralick  {3},  provides  an  excellent 
review  of  the  various  approaches  used  in  remote  sensing  as  well  as 
other  applications  which  use  image  processing.  In  general,  these 
texture-based  approaches  have  implicit  spatial  models,  and  in  some 
ways  are  similar  to  unsupervised  classification.  In  both  approaches, 
groups  of  pixels  derived  from  the  image  data  (on  the  basis  of  either 
spatial  or  spectral  patterns)  are  related  a posteriori  to  the  elements 
in  the  ground  scene.  In  these  approaches,  no  attempt  is  made  to  under- 
stand the  geographic  processes  in  the  scene  that  created  the  spatial 
patterns  in  the  image  data.  In  this  respect,  all  work  relying  on  image 
texture  has  been  empirical. 

METHODS 


Whenever  remotely  sensed  data  consist  of  images,  an  important  new 
information  component  is  added  to  the  measurement  output  by  the  sensor 
--  its  spatial  position.  Since  the  position  of  the  measurement  in  the 
image  is  usually  a quantifiable  function  of  the  position  in  the  scene  of 
the  resolution  cell  from  which  it  is  derived,  each  measurement  can  be 
associated  with  a ground  location  and  be  positioned  relative  to  other 
measurements.  From  a statistical  viewpoint,  the  sensor's  response 
then  becomes  a regionalized  variable  — a random  variable  whose  position 
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in  time  or  space  is  known.  (Due  to  sensor  imperfections,  individual 
measurements  may  not  in  reality  be  entirely  independent  of  their 
neighbors.  However,  from  the  theoretical  viewpoint  presented  here, 
each  measurement  is  considered  an  independent  observation.) 

Assume  that  Y(x)  is  a regionalized  random  variable  associated  with 
location  x_.  As  an  example,  a digital  image  can  be  regarded  as  a single 
realization  of  the  variables  Y(x^),  where  the  set  of  x_. , i=l,...,n. 
correspond  to  the  n resolution  cells  in  the  image.  If  the  Y(x_. ) are 
uncorrelated,  then  the  image  will  consist  of  random  noise.  If,  however, 
the  Y(x^),  are  in  some  way  related,  then  the  data  will  exhibit  spatial 
structure.  Perhaps  the  weakest  assumption  one  can  made  about  this 
structure  is  what  Matheron  {6},  refers  to  as  the  "intrinsic"  hypothesis 
— that  the  increments  Y(^.  + Ji) -Y (x^. ) associated  with  a small  distance 
are  weakly  stationary.  Under  this  assumption,  the  first  moment  of  the 
increment,  its  expected  value,  is  constant  or  at  least  only  slowly 
varying  with  spatial  position  x;  and  the  second  moment  is  also  invariant 
with  spatial  position. 

The  second  moment, 

2y(h)  = E {Y(x.  + h)  - Y(x.)}2 

I 1 » 

is  referred  to  as  the  variogram;  y(h)  becomes  the  semivariogram  (6). 

Just  as  the  variance  characterizes  the  distribution  of  a nonspatial 
random  variable.  Geostatisticians  have  used  the  variogram  as  a primary 
tool  to  measure  the  zone  of  influence  of  each  Y(x^. ) on  the  next,  indicate 
intermeshed  structures,  reveal  anisotrophy,  and  detect  spatial 
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discontinuities  {6}.  The  one  dimensional  case  is  presented  for 
simplicity,  but  this  approach  is  easily  generalized  to  the  multi- 
dimensional case  by  considering  h to  be  a vector. 

A VICAR  (Video  Image  Communication  and  Retrieval  System)  program 
VRIOGRM  was  written  to  calculate  a two-dimensional  variogram  for  image 
data.  Ideally,  a variogram  should  be  computed  using  each  pixel  as  a 
center  or  target  point,  against  which  all  other  pixels  in  the  image  are 
compared.  Since  remotely  sensed  images  tend  to  be  large,  this  approach 
is  computationally  unrealistic,  and  constraints  need  to  be  imposed. 

One  constraint  concerns  the  distance  h over  which  the  variogram  is  to 
be  measured.  This  distance  can  be  thought  of  as  a "window  size"  when 
using  image  data  and  needs  to  be  larger  than  the  zone  of  influence  and 
large  enough  for  any  periodicities  in  the  data  to  be  revealed.  Since 

p 

VRIOGRM  produces  a square  variogram,  (2h  + 1)  pixels  are  compared  with 
any  center  point  in  the  image. 

The  second  constraint  concerns  the  selection  of  points  in  the  image 
to  be  used  as  centers  of  windows.  In  VRIOGRM,  the  number  of  pixels  in 
the  image  to  be  used  as  a center  point  in  the  calculation  of  the  vario- 
gram is  specified  as  a parameter.  The  actual  locations  to  be  used  in 
the  image  are  determined  randomly.  When  the  locations  in  the  image  used 
as  center  points  is  a sample  of  the  entire  image,  it  should  be  noted  that 
the  resulting  variogram  must  be  considered  an  estimate  of  the  true  vario- 
gram. The  variograms  shown  in  this  paper  are  displayed  as  contour  plots 
of  bivariate  histograms. 


A second  method  used  to  measure  spatial  pattern  in  image  data  was 
that  of  graphs  of  local  variance  as  a function  of  spatial  resolution. 
Calculation  of  these  graphs  is  accomplished  by  measuring  local  variance 
in  the  image  data,  degrading  the  imagery  to  successively  coarser  resolu- 
tions, and  then  measuring  local  variance  at  each  new  resolution.  The 
graphs  provide  insight  into  the  size  and  nature  of  elements  in  the  scene, 
and  can  be  used  to  help  define  the  elements  that  should  be  used  in  scene 
inference.  At  a time  when  remotely  sensed  data  is  becoming  available 
at  continually  decreasing  spatial  resolutions,  these  graphs  should  prove 
invaluable  in  helping  understand  how  spatial  patterns  will  vary  for  given 
environments  as  a function  of  spatial  resolution. 

For  this  work,  local  variance  is  measured  for  any  image  as  the  mean 
value  of  a texture  image  created  by  the  VICAR  program  PIXSTAT.  In  this 
program,  the  standard  deviation  of  a 3 x 3 moving  window  of  pixels  is 
computed,  scaled,  and  placed  in  the  location  of  the  center  pixel.  Thus, 
for  each  window  a value  is  produced  that  indicates  the  local  tonal  variance, 
and  the  mean  value  for  the  entire  image  serves  as  a reasonable  measure  of 
the  overall  local  variance. 

The  algorithm  that  has  been  used  to  degrade  the  imagery  to  success- 
ively coarser  resolutions,  simply  averages  resolution  cells  to  be  combined 
into  a single  larger  resolution  cell.  This  approach  implies  an  idealized 
square  wave  response  on  the  part  of  the  sensor  and  is  limited  to  degrada- 
tion at  integer  multiples.  Although  point  spread  functions  obviously  differ 
significantly  from  an  idealized  square  wave  response,  the  point  at  issue 
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here  is  the  scene  model,  not  the  sensor  model.  Adopting  such  a simple 
sensor  model  avoids  needless  complexity  at  this  stage  of  the  research. 

The  imagery  used  for  the  analysis  of  spatial  pattern  was  digitally 
scanned  from  color  aerial  transparencies  using  a microdensitometer,  thus 
allowing  the  analysis  of  spatial  pattern  at  finer  resolutions  than  are 
available  from  conventional  spaceborne  sensors.  Three  images  were 
scanned  at  different  resolutions:  a forest  scene  in  South  Dakota  where 

individual  pixels  are  0.75m  on  a side;  a forest  scene  in  Colorado  with 
pixels  1.5m  on  a side;  and  an  agricultural  scene  with  pixels  0.15m  on 
a side. 

RESULTS 

South  Dakota  Forest  Image 

Figure  2 shows  the  graph  of  local  variance  as  a function  of  spatial 

resolution  for  the  South  Dakota  forest  image.  Local  variance  is  low 

at  the  resolution  that  the  photo  was  scanned,  or  0.75m  (Figure  1A).  At 
this  resolution,  if  a pixel  falls  on  a tree,  its  immediate  neighbors 
are  also  likely  to  be  on  the  tree,  since  many  pixels  comprise  individual 
trees.  In  this  situation,  the  pixels  in  a 3 x 3 window  are  likely  to 
have  similar  DNs  and  the  local  variance  will  be  low.  Similarly,  if  a 
pixel  lies  on  the  background,  its  neighbors  are  also  likely  to  be  on  the 
background,  and  local  variance  will  again  be  low.  Naturally,  some  pixels 
will  fall  along  the  borders  of  the  trees  or  background,  and  as  a result 

will  have,  high  local  variance,  but  the  mean  local  variance  for  the  image 
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will  still  be  low. 

As  the  size  of  individual  resolution  cells  increase,  the  number  of 
pixels  comprising  an  individual  tree  decreases,  and  the  likelihood  that 
surrounding  pixels  will  be  similar  decreases  (FigurelB).  In  this 
situation,  local  variance  increases.  This  trend  continues  until  a peak 
in  local  variance  is  observed  at  approximately  the  size  of  individual 
tree  crowns,  or  6m.  At  this  resolution  (Figure  1C),  the  pattern 
becomes  very  mottled  as  individual  pixels  tend  to  be  alternatively 
either  on  a tree  or  on  the  background,  and  the  local  variance  is  very 
high.  As  the  resolution  increases  past  this  peak,  local  variance 
decreases.  This  decrease  is  associated  with  individual  pixels  being 
increasingly  characterized  by  a mixture  of  both  trees  and  background. 

As  this  mixing  of  elements  occurs,  all  pixels  being  to  look  similar  and 
the  local  variance  continues  to  decrease  (Figure  ID  - 1G). 

There  is  considerable  structure  in  the  contour  plot  of  the  vario- 
gram  of  the  South  Dakota  forest  image  (Figure  3).  The  strength  of  the 
relationship  between  a given  pixel  and  its  surroundings  tend  to  decrease 
with  distance  until  it  reaches  a plateau  at  about  the  eighth  contour 
line.  At  this  distance,  the  relationship  between  pixels  is  essentially 
as  if  they  were  selected  at  random.  Ideally,  this  portion  of  the  contour 
plot  should  be  flat,  but  it  appears  to  have  local  peaks  and  valleys. 

This  effect  may  be  attributed  to  the  fact  that  the  contour  plot  is 
derived  from  an  estimated  variogram.  With  increased  sampling,  this 
mottled  appearance  may  be  reduced  or  even  disappear. 
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Another  notable  feature  of  the  variogram  is  its  anisotropy,  which 
is  directly  attributable  to  the  shadowing  related  to  the  direction  of 
illumination  (Figure  1A).  The  variogram  is  markedly  elongated  along 
an  axis  approximately  diagonal  from  the  upper  right  corner  to  the  lower 
left  corner.  Since  shadows  look  more  like  the  trees  than  the  background, 
the  shadow  of  a tree  tends  to  reduce  the  variance  measured  in  the 
direction  of  the  shadow. 

Colorado  Forest  Image 

A picture  of  the  area  in  Colorado  digitally  scanned  from  an  aerial 
transparency  for  analysis  of  spatial  pattern  is  shown  in  Figure  4.  The 
photo  was  scanned  at  a ground  resolution  corresponding  to  1.5m  on  a side. 

The  graph  of  local  variance  as  a function  of  spatial  resolution  (Figure  5) 
has  the  same  basic  structure  as  was  observed  for  the  South  Dakota  forest 
image.  The  local  variance  begins  relatively  low,  as  individual  trees 
are  multipixel  elements,  peaks  at  approximately  the  size  of  an  individual 
tree,  and  then  decreases  as  resolution  size  increases.  Interestingly, 
local  variance  peaks  at  approximately  9.0m  in  this  image  (as  opposed  to 
6.0m  in  the  South  Dakota  forest  image);  this  effect  is  attributable  to 
the  larger  tree  crown  diameters  found  in  the  Colorado  frame. 

The  structure  of  the  variogram  for  the  Colorado  forest  image  (Figure  6) 
is  again  similar  to  the  variogram  of  the  South  Dakota  forest  image. 

Variance  is  observed  to  increase  with  distance  until  it  eventually  reaches 
a plateau.  The  zone  of  influence,  or  distance  from  the  center  to  the 


plateau,  is  larger  in  the  Colorado  forest  image,  as  would  be  expected 
due  to  the  larger  trees  in  the  area.  This  difference  in  variograms  is 
not  obvious  because  the  abscissa  records  the  number  of  resolution  cells 
rather  than  a direct  measure  of  distance.  Since  the  Colorado  forest 
image  data  has  resolution  cells  twice  the  size  (on  a side)  as  the  South 
Dakota  Forest  image,  its  zone  of  influence  is  larger  than  it  appears  on 
the  graph.  As  noted  with  the  South  Dakota  forest  image,  anisotropy  in 
the  variogram  is  directly  attributable  to  the  direction  of  illumination. 

While  the  results  of  the  Colorado  forest  image  data  are  quite 
similar  to  those  for  the  South  Dakota  forest  image,  they  serve  the  use- 
ful purpose  of  substantiating  the  interpretation  of  the  results  from 
these  methods  of  the  analysis  of  spatial  pattern.  Due  to  the  highly 
experimental  nature  of  these  methods,  it  is  reassuring  to  find  their 
results  consistently  attributable  to  the  characteristics  of  the  two 
different  scenes.  Another  factor  that  may  be  important  for  future 
analysis  is  that  the  Colorado  forest  image  contains  considerable 
variability  in  canopy  density.  It  will  be  interesting  to  see  how  the 
variogram  of  this  area  changes  when  computed  only  in  areas  with  certain 
densities  of  trees  are  included.  These  tests  may  allow  for  an  improved 
understanding  of  the  sensitivity  of  variograms  to  changes  in  scene 
characteristics. 


Agricultural  Image 


A picture  of  the  agricultural  area  digitally  scanned  from  an  aerial 
transparency  for  analysis  of  spatial  pattern  is  shown  in  Figure  7.  The 
original  resolution  of  the  digital  data  is  0.15m  on  a side,  and  was 
scanned  at  such  a fine  resolution  in  an  attempt  to  analyze  spatial 
structure  within  fields.  Traditionally,  the  remote  sensing  community 
has  viewed  agricultural  fields  as  homogeneous  elements,  largely  due  to 
the  spatial  resolution  of  the  available  data.  However,  as  spatial 
resolution  decreases  on  future  sensors,  more  spatial  structure  within 
agricultural  fields  will  be  resolvable. 

The  graph  of  local  variance  as  a function  of  spatial  resolution  for 
the  agricultural  image  does  not  show  the  same  structure  as  the  graphs  for 
the  forest  images,  in  that  there  is  no  initial  low  local  variance  (Figure  9). 
It  was  initially  hypothesized  that  at  very  fine  spatial  resolutions, 
agricultural  images  would  exhibit  a similar  pattern  in  local  variance  as 
was  found  in  the  forest  images.  In  an  agricultural  setting,  individual 
plants  or  crop  rows  would  be  multipixel  elements,  and  local  variance 
would  be  low.  At  the  resolution  approximately  the  width  of  the  crop 
rows,  the  local  variance  would  peak,  and  begin  its  familiar  decline. 

However,  Figure  9 shows  that  local  variance  simply  decreases  as  a function 
of  spatial  resolution  in  the  image  data. 

One  reason  that  the  initial  low  local  variance  did  not  occur  is  that 
the  spatial  resolution  of  the  data  was  not  fine  enough  to  detect  the 
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homogeneity  of  the  crop  row  as  an  element  in  the  scene.  The  distance 
between  crop  rows  is  approximately  5 resolution  cells  at  the  resolution 
of  0.15m  that  the  data  was  originally  scanned.  In  those  five  pixels 
are  included  the  well  illuminated  portion  of  a crop  row,  the  shaded 
side  of  the  crop  row,  and  the  space  between  the  rows.  As  a result, 
very  few  3x3  windows  in  the  image  will  have  low  variance.  If  resolu- 
tion were  considerably  reduced,  variance  within  both  the  shaded  and  well 
illuminated  portions  of  a single  crop  row  would  be  low.  However,  for 
this  affect  to  be  observed,  a spatial  resolution  on  the  order  of  5 cm 
would  be  required  for  this  image.  Another  factor  that  may  be  contributing 
to  the  lack  of  initial  low  variance  is  that  the  crop  is  in  a mature  stage, 
and  the  crop  rows  have  grown  close  together.  Thus,  there  is  not  a well 
developed  background  signal  between  rows,  against  which  the  crop  rows 
would  be  highly  constrasting. 

Variograms  were  computed  for  two  of  the  agricultural  fields  in  the 
image  and  the  entire  agricultural  image  as  a whole.  These  variograms 
exhibit  considerable  structure  related  to  the  orientation  and  spacing 
of  the  rows.  Figure  8A  shows  the  variogram  of  the  field  in  the  upper 
left  portion  of  the  agricultural  image  (Figure  7).  From  the  variogram 
it  is  easy  to  determine  both  the  direction  of  the  rows,  and  their  spacing. 
The  crop  rows  are  oriented  horizontally  in  this  portion  of  the  image,  as 
can  be  seen  by  the  low  variance  associated  with  horizontal  movement  in 
the  image.  Variance  changes  sharply  with  movement  across  the  rows, 
with  variance  increasing  up  to  one  half  of  the  distance  between  rows. 


From  that  point,  variance  decreases,  until  a minimum  is  reached  at  the 
distance  between  rows.  This  cycle  of  high  variance  at  the  half  width 
and  low  variance  at  even  multiples  of  the  distance  between  rows  is 
repeated  all  the  way  to  the  edges  of  the  variogram,  and  would  continue 
if  the  variogram  had  been  calculated  for  a larger  window  size.  The 
distance  between  rows  can  be  determined  by  counting  the  number  of  pixels 
between  the  ridges  or  valleys  in  the  variogram. 

A physical  explanation  of  the  periodicity  in  the  variogram  is  as 
follows.  Regardless  of  where  the  starting  point  is  relative  to  a crop 
row,  if  you  move  in  the  direction  perpendicular  to  the  rows  the  distance 
of  one  crop  row,  you  are  likely  to  be  in  the  same  position  relative 
to  a crop  row.  In  this  situation,  since  the  pixels  are  positioned 
similar  DNs  and  the  resulting  variance  will  be  low.  Conversely,  if  you 
move  one  half  the  distance  between  crop  rows,  the  new  location  will  be 
very  different  relative  to  a crop  tow,  and  thus  the  difference  in  DNs  of 
the  pixels  and  the  resulting  variance  will  be  large. 

For  the  field  in  the  lower  left  portion  of  the  image,  the  variogram 
(Figure  8B)  exhibits  similar  structure  as  the  previous  variogram  except 
the  row  direction  is  rotated  90  degrees.  The  same  pattern  of  ridges 
and  valleys  occurs  at  the  same  spacing  between  rows.  The  pattern  in 
the  variogram  for  the  entire  agricultural  image  (Figure  8C)  is  easier  to 
understand  after  looking  at  the  variograms  for  the  individual  fields. 

The  variogram  for  the  entire  image  simply  superimposes  the  variograms 
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from  fields  with  rows  in  perpendicular  directions. 

DISCUSSION  AND  CONCLUSIONS 

The  results  of  this  study  indicate  that  measures  of  spatial  pattern 
in  image  data  can  be  related  to  the  characteristics  of  the  elements  in 
the  scene.  The  results  of  this  analysis  of  spatial  pattern  should  be 
viewed  as  a first  step  in  understanding  the  relationship  between  scene 
models  and  spatial  patterns  in  image  data,  and  the  eventual  use  of 
spatial  data  in  scene  inference.  However,  based  on  the  results  presented, 
some  generalizations  about  the  use  of  spatial  data  in  scene  inference 
can  be  made. 

The  graphs  of  local  variance  as  a function  of  spatial  resolution 
give  an  indication  of  spatial  resolutions  where  the  use  of  spatial  data 
will  be  important,  as  a function  of  the  elements  in  the  scene.  At  spatial 
resolutions  where  local  variance  is  low  the  information  in  the  spatial 
domain  is  low,  and  scene  inference  based  solely  on  spectral  data  may  be 
appropriate.  However,  at  spatial  resolutions  where  local  variance  is 
high,  the  use  of  spatial  data  becomes  more  important,  as  the  use  of  only 
spectral  data  is  likely  to  yield  poor  results.  These  graphs  also  demonstrate 
that  local  variance  changes  as  a function  of  the  scene  characteristics  for 
a given  spatial  resolution.  For  example,  at  spatial  resolutions  in  the 
20-30m  range  (where  data  from  new  sensor  systems  will  soon  be  available), 
the  forest  images  begin  to  exhibit  higher  spatial  variability.  However, 
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in  the  agricultural  image,  the  local  variance  is  still  quite  low  at 
those  resolutions.  These  results  indicate  that  the  use  of  spatial  data 
in  scene  inference  will  be  more  important  in  forested  scenes  than 
agricultural  scenes  when  using  data  from  these  new  sensors. 

The  graphs  of  local  variance  as  a function  of  spatial  resolution 
can  be  useful  in  helping  to  define  the  elements  in  a scene  and  the 
appropriate  remote  sensing  model  to  be  used  in  scene  inference.  The 
graphs  for  conifer  forests  presented  above  may  help  explain  the  results 
of  previous  studies  designed  to  test  the  influence  of  spatial  resolution 
on  forest  classification  accuracy.  Sadowski  and  Sarno  {12}  and  Latty 
and  Hoffer  {9}  both  found  that  classification  accuracies  decreased  in 
forested  areas  as  the  size  of  the  resolution  cells  decreased.  These 
decreasing  accuracies  are  almost  certainly  an  artifact  of  the  definition 
of  the  elements  in  the  scene  and  the  remote  sensing  model  used. 

Starting  with  large  resolution  cells,  the  elements  in  the  scene  are 
defined  as  forest  stands,  or  areas  large  enough  to  be  characterized  by 
numerous  trees.  The  classification  of  forest  stands  is  based  on  descrip- 
tions that  generalize  the  characteristics  of  trees  in  stands,  and  can  be 
thought  of  as  forest  types.  In  this  situation,  an  element  (or  forest 
stand)  is  a mixture  of  a variety  of  smaller  objects.  A simple  conceptual 
model  of  this  mixture  is  a combination  of  trees  and  a homogeneous  back- 
ground. With  large  resolution  cells,  individual  pixels  also  will  be 
characterized  by  a mixture  of  trees  and  background,  and  will  generally 
be  representative  of  the  larger  forest  stand  type.  However,  as  the 
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resolution  cell  size  decreases,  individual  pixels  will  be  decreasingly 
characterized  by  mixtures  of  trees  and  backgrounds.  Eventually, 
resolution  cells  tend  to  be  either  in  the  location  of  a tree  or  on  the 
background.  At  this  point,  the  elements  should  switch  from  forest  stand 
types  to  individual  trees.  However,  in  these  studies  the  elements,  or 
targets  of  classification,  remained  forest  types  throughout  the  study. 

The  decreases  in  accuracy  associated  with  shrinking  cell  size  may  be 
attributable  to  the  increasing  inappropriateness  of  the  remote  sensing 
model  used  for  scene  inference.  From  the  point  of  view  of  the  classifier, 
pixels  are  eventually  differentiated  into  tree  and  background  classes, 
all  in  areas  originally  designated  as  forest  types.  In  one  sense,  at 
small  resolution  sizes  the  accuracy  of  the  classification  could  surpass 
the  ability  of  the  techniques  available  to  evaluate  it.  This  situation 
suggests  a restructuring  of  the  question  of  what  accuracy  means  as  spatial 
resolutions  change. 

In  conclusion,  the  two  methods  of  measuring  spatial  patterns  in  image 
data  reveal  useful  and  different  information  concerning  the  characteristics 
of  the  elements  in  the  scene.  Variograms  illustrated  the  anisotropy  in 
the  data  attributable  to  the  direction  of  illumination,  found  periodicities 
in  the  data,  and  measured  the  zone  of  influence  of  pixels  on  their 
surroundings.  Variograms  are  a method  of  measuring  spatial  patterns  in 
image  data  that  may  be  useful  in  future  scene  inference  techniques  that 
rely  more  on  data  from  the  spatial  domain.  The  second  method,  graphing 
local  variance  as  a function  of  spatial  resolution,  is  most  useful 
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because  it  readily  displays  the  interaction  between  scene  elements  and 
spatial  resolution.  Through  the  use  of  these  graphs,  more  informed 
decisions  can  be  made  concerning  the  nature  of  the  scene  inference 
techniques  to  be  used,  given  the  spatial  resolution  of  the  data 
available  and  the  nature  of  the  scene. 
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Figure  1A.  South  Dakota  Forest  Image  at  the  original 
resolution  at  which  it  was  scanned.  Each  pixel  is  0.75m 
on  a side. 


Figure  IB.  South  Dakota  Forest  Image  after  degradation. 
Each  pixel  in  this  image  is  3.0m  on  a side  and  contains 
16  of  the  original  pixels. 
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Figure  1C.  South  Dakota  Forest  Image  after  degradation. 
Each  pixel  in  this  image  is  6.0m  on  a side. 


Figure  ID.  South  Dakota  Forest  Image  after  degradation. 
Each  pixel  in  this  image  is  9.0m  on  a side. 


Figure  IF.  South  Dakota  Forest  Image  after  degradation 
Each  pixel  in  this  image  is  18.0m  on  a side. 


South  Dakota  Forest  Image  after  degradation 
in  this  image  is  12.0m  on  a side. 


Figure  IE. 
Each  pixel 


Figure  1G.  South  Dakota  Forest  Image  after  degradation. 
Each  pixel  in  this  image  is  24.0m  on  a side. 


Figure  2.  Graph  of  local  variance  as  a function  of  spatial 
resolution  for  the  South  Dakota  Forest  Image  data. 
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Figure  4.  Photograph  of  the  Colorado  Forest  scene  that  was 
digitally  scanned  for  analysis  of  spatial  pattern. 


Figure  5.  Graph  of  local  variance  as  a function  of  spatial 
resolution  for  the  Colorado  Forest  Image  data. 
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Figure  7.  Photograph  of  the  Agricultural  scene  that 
was  digitally  scanned  for  analysis  of  spatial  pattern 


Figure  8C.  Contour  plot  of  the  Two-Dimensional  Variogram  of 
the  entire  Argicultural  Image. 
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Graph  of  local  variance  as  a function 
resolution  for  the  Agricultural  Image. 
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Abstract 

We  review  previous  efforts  to  recover  surface  shape  from  image  irradiance  in 
order  to  assses  what  can  and  cannot  be  accomplished.  We  consider  the  informa- 
tional requirements  and  restrictions  of  these  approaches.  In  dealing  with  the  ques- 
tion of  what  surface  parameters  can  be  recovered  locally  from  image  shading,  we 
show  that,  at  most,  shading  determines  relative  surface  curvature,  i.e,  the  ratio 
of  surface  curvature  measured  in  orthogonal  image  directions.  The  relationship 
between  relative  surface  curvature  and  the  second  derivatives  of  image  irradiance  is 
independent  of  other  scene  parameters,  but  insufficient  to  determine  surface  shape. 
This  result  places  in  perspective  the  difficulty  encountered  in  previous  attempts  to 
recover  surface  orientation  from  image  shading. 
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1.  Introduction 

The  determination  of  land  cover  from  aerial  imagery  is  a task  thatphoto  in- 
terpreters accomplish  by  using  both  the  image  data  and  their  knowledge  of  the 
structure  of  the  world.  The  image  data  encodes  the  complex  process  whereby  light 
is  reflected  from  a surface.  The  surface  shape,  the  surface  albedo,  the  position  of 
the  lighting  sources,  and  the  functional  form  of  the  reflectance  properties  of  the 
material  are  elements  of  this  encoding.  The  human  visual  system  interprets  image 
data  as  a 3-D  model  of  the  scene,  distinguishes  among  different  surface  materials, 
and  ascertains  the  position  of  the  lighting  sources.  It  is  difficult  to  believe  that 
a machine  vision  system  can  achieve,  say,  surface  material  differentiation  without 
simultaneously  being  able  to  recover  the  surface  shape  and  the  other  parameters 
that  are  needed  to  explain  the  detected  image  intensity.  Of  course,  it  may  be  pos- 
sible to  use  special  sensors  and  multiple  information  sources  to  make  it  unnecessary 
to  reconstruct  a complete  3-D  model  of  the  scene,  but  it  would  be  surprising  if  such 
specialization  could  retain  sufficient  generality  to  be  useful  over  a range  of  remote 
sensing  tasks,  e.g.,  in  both  renewable  and  nonrenewable  resources. 

The  machine  vision  approach  of  simultaneously  recovering  all  the  parameters 
necessary  to  account  for  image  intensity  is  expressed  in  the  notion  of  intrinsic  images 

t 

[1]  (or  the  2|-D  sketch  [10]).  These  intrinsic  images  can  be  thought  of  as  overlays, 
each  specifying  the  value  of  one  parameter  that  goes  into  the  formula  for  calculating 
the  image  intensity.  The  images  are  not  independent;  if  one  is  to  be  varied,  the 

The  research  reported  herein  was  supported  by  the  Defense  Advanced  Research  Projects  Agency 
under  Contract  MDA903-83-C-0027  and  by  the  National  Aeronautics  and  Space  Administration 
under  Contract  NASA  9-18664.  These  contracts  are  monitored  by  the  U.S.  Army  Engineer 
Topographic  Laboratory  and  by  the  Texas  A&M  Research  Foundation  for  the  Lyndon  B.  Johnson 
Space  Center. 
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others  must  be  also  — so  that  the  predicted  image  intensity  remains  invariant  (and 
equal  to  the  observed  value).  The  notional  division  of  an  image  into  particular 
intrinsic  images  would  be  of  little  merit  unless  one  believed  that  estimates  of  each 
intrinsic  image  could  be  obtained  by  models  that  were  largely  independent  of  the 
other  intrinsic  images.  While  models  have  been  proposed  to  recover  various  intrinsic 
images,  there  have  been  considerable  efforts  made  to  recover  the  scene’s  3-D  shape,1 
in  particular  the  surface  orientation  at  each  image  point.  These  ‘shape-from-...’ 
models  embody  a structure  that  would  allow  shape  to  be  recovered  principally  from 
a single  measure,  e.g.,  texture,  contour,  or  shading.  While  ‘shape-from-...’  models 
are  not  seen  as  complete  solutions  to  shape  reconstruction,  there  is  an  implicit 
expectation  in  their  title  that  shape  estimates  can  be  calculated  from  their  respective 
measures.  Here  we  review  the  work  we  and  others  have  done  towards  the  goal  of 
recovering  surface  shape  from  image  shading.  Is  it  attainable  — or  is  it  myth? 

The  importance  of  shape  recovery  is  clear;  if  the  shape  is  known,  surface  albedo, 
and  the  other  parameters  that  determine  image  intensity  are  obtainable.  Land 
cover  differentation  is  dependent  on  knowing  the  [relative]  surface  albedo,  rather 
than  image  parameters,  such  as  intensity.  If  we  cannot  recover  shape,  the  intrinsic 
image  approach  offers  little  as  a model  for  perception.  Shading  is  only  one  source 
of  shape  information.  Edge  information  is  of  great  importance,  but  there  is  little 
occlusion  in  aerial  images.  The  ability  to  recover  shape  from  shading  seems  more 
critical  in  the  case  of  aerial  imagery  than  for  most  other  types  of  imagery. 

We  first  review  three  research  efforts:  those  of  Horn  and  his  colleagues 

*We  use  the  expression  surface  shape  to  denote  both  the  intrinsic  properties  of  the  surface,  e.g., 
cylindrical,  and  the  orientation  of  the  surface  in  space.  Elsewhere,  shape  is  sometimes  used  to 
denote  only  the  intrinsic  properties  of  the  surface,  not  its  orientation  in  space. 
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[6,7,9,14,15,16],  Pentland  [11],  and  our  own  [12,13],  — to  determine  what  can  and 
cannot  be  accomplished,  and  to  consider  the  informational  requirements  and  limita- 
tions of  these  approaches.  We  discuss  the  dilemma  of  local  computation  versus 
global  constraint  propagation  and  seek  to  ascertain  what  can  be  computed  locally, 
and  how  information  can  be  propagated  across  an  image.  Finally,  we  seem  to  be 
left  with  the  conclusion  that  shading,  when  viewed  as  a single  source  of  shape  in- 
formation, is  an  insufficient  source  for  the  recovery  of  surface  shape.  Shape  cannot 
be  obtained  from  shading  alone.  However,  we  are  able  to  characterize  the  scene 
information  that  shading  provides. 

An  alternative  approach  to  recovering  shape  from  shading  is  model  based.  Can 
we  determine  which  model,  from  a set  of  models,  best  describes  the  image  data? 
This  approach  is  dependent  on  discovering  a small  set  of  easily  distinguishable 
models  that  adequately  describe  the  surfaces  encountered.  Industrial  inspection, 
rather  than  remote  sensing  of  the  environment,  appears  better  suited  to  a model 
based  procedure.  In  this  assessment  we  do  not  consider  this  related,  but  essentially 
different  approach. 

2.  Approaches  to  Shape  from  Shading 

2.1  Horn  and  Colleagues 

A study  by  Horn  [6,7,8]  of  the  relationship  among  image  irradiance,2  surface 
shape,  surface  albedo,  and  illumination  conditions  led  to  formulation  of  the  image 

2Image  irradiance  is  the  light  flux  per  unit  area  falling  on  the  image,  i.e.,  incident  flux  density. 
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irradiance  equation,  which  states  that  image  irradiance  is  proportional  to  scene 
radiance.3  This  is  expressed  by  the  equation 

I = R , 

where  / is  the  image  irradiance  as  a function  of  the  image  coordinates,  and  R is 
the  scene  radiance  as  a function  of  the  scene  parameters.  Of  course,  this  equation 
relates  the  image  irradiance  at  an  position  in  the  image  to  the  scene  radiance  at 
its  corresponding  scene  position.  Implicit  in  this  equation  is  an  assumption  of 
orthographic  projection.  However,  such  an  assumption,  to  avoid  complexity  in  the 
mathematical  formulation,  is  a minor  restriction  and  does  not  detract  from  the 
generality  of  the  model. 

Image  irradiance  is  a function  of  the  image  coordinates  x and  y,  but  scene 
radiance  is  a function  of  the  illumination  strength,  its  position,  the  surface  albedo, 
and  the  surface  orientation.  For  the  formulations  reviewed  here,  we  find  that 
a number  of  assumptions  are  made  so  that  scene  radiance  can  be  considered  a 
function  of  the  surface  orientation  variables  only;  constant  values  are  used  for  the 
illumination  strength,  its  position,  and  for  the  surface  albedo.  That  is,  shape- 
from-shading  is  analyzed  for  the  simplified  case  of  a constant  light  source  and 
constant  surface  albedo.  The  restriction  to  a constant  light  source  is  not  only  a good 
approximation  of  the  situation  we  experience  daily  (and  an  excellent  approximation 
for  a photograph),  but  also  corresponds  to  the  difficulty  confronting  the  human 
visual  system  when  this  constancy  is  not  met,  e.g.,  under  strobe  lighting.  The 
assumption  of  constant  albedo  is  harder  to  justify,  since  nature  obviously  exhibits 

3Scene  radiance  is  the  light  flux  per  unit  projected  area  per  unit  solid  angle  emitted  from  the 
scene,  i.e.,  emitted  flux  density  per  unit  solid  angle. 
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variable  albedo.  Still,  when  we  consider  the  manner  in  which  facial  make-up  is  used 
to  alter  the  perceived  shape  of  the  face,  it  may  well  be  that  continuous  changes 
in  albedo  are  processed  by  the  human  visual  system  as  if  they  were  constant. 
Notwithstanding  the  justification  for  constant  albedo,  it  is  unlikely  that  shape- 
from-shading  can  be  solved  for  the  case  of  variable  albedo  if  it  cannot  be  solved 
for  constant  albedo.  Such  a restriction  is  in  effect  a case  analysis  to  determine  if 
shading  provides  sufficient  shape  information  in  a less-than-general  model. 

In  the  formulations  under  review,  various  parameterizations  of  surface  orien- 
tation have  been  used.  The  two  we  specify  are  (i)  surface  gradients,  i.e.,  the  partial 
derivatives  of  depth,  2,  with  respect  to  the  scene  (and  image)  coordinates  x and  y, 
and  (ii)  components  of  the  surface  normal,  i.e.,  I and  m,  the  x and  y components 
of  the  surface  normal.  Using  the  notation,  p = §§,  and  q = we  note  the 
equivalence  of  the  parameterizations 

— / , —m 

P=  , ,and  q = ; ....  „_z  • 

\/l  — l2  — m2  \/l  — l2  — m2 

The  image  irradiance  equation  is  usually  expressed  as 

I{x,  y)  = R{p,  q)  , or  J(x,  y)  = R{1,  m)  , 

and  we  shall  use  both  forms  to  express  the  relationship  between  image  irradiance  and 
scene  radiance  for  the  case  of  constant  illumination  and  constant  albedo.  As  p = 
f§,  and  q — we  see  that  the  image  irradiance  equation  is  a first-order  partial 
differential  equation  and,  if  I and  R are  known,  we  could  (at  least  in  principle)  solve 
the  differential  equation  and  recover  the  depth,  2. 

To  have  an  explicit  form  for  R,  we  must  have  a model  for  the  type  of  reflection 
occurring  at  the  scene  surfaces.  In  the  work  reviewed  here  the  surface  is  assumed 
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to  be  a perfectly  uniform  diffuse  reflector,  i.e.,  the  scene  radiance  is  isotropic.4 
While  this  model  is  invalid  as  a description  of  specular  reflection,  scene  radiance  in 
the  natural  world,  (except  for  specific  situations,  such  as  water  surfaces),  may  be 
approximated  by  such  a description.  The  expression  for  scene  radiance  in  this  case 
is  [12] 

R(l,  m)  — al  + bm  + c\J  1 — /2  — m2 


or,  equivalently, 


R{p,  q) 


(—ap  — bq  + c) 
y/l  + P2  + q 2 


where  a,b,  and  c are  constants  expressing  illumination  strength,  its  position,  and 
the  surface  albedo. 

The  approach  taken  by  Horn  and  his  colleagues  [6,7,9,14,15,16]  is  to  solve  the 
first-order  partial  differential  equation, 


, {—ap  — bq  + c 

x,  y)  — — — , 

y/l  + P2  + ?2 

assuming  that  a,b,  and  c are  known  — i.e.,  the  surface  albedo,  and  the  illumination 
strength,  and  its  position.  While  this  need  to  know  scene  parameters  may  seem  over- 
restrictive,  such  information  may  come  from  other  components  of  a vision  system. 
The  need  to  know  the  illumination  position  does  not  seem  to  be  a major  drawback  of 
this  approach,  but  the  requirement  that  the  scene  albedo  be  known  is  troublesome. 
If  the  conceptual  model  of  intrinsic  images  is  to  be  followed,  the  inability  to  decouple 
surface  orientation  from  surface  albedo  would  seem  fundamental.  Regardless  of  this 

4This  situation  is  also  called  Lambertian  reflectance,  after  Lambert,  who  proposed  a point  reflection 
model  (in  which  the  reflected  flux  per  unit  surface  area  per  unit  solid  angle  varied  as  the  cosine 
of  the  angle  between  the  surface  normal  and  the  viewing  direction)  to  account  for  the  observation 
that  matt  surfaces  looked  equally  bright  from  any  viewing  position. 


) 
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difficulty,  the  question  of  whether  shape  can  be  recovered  in  a limited  domain  is 
basic  to  the  investigation  of  vision. 

Two  approaches  to  solving  the  image  irradiance  equation  are  direct  integra- 
tion [6,7],  and  iterative/relaxation  techniques  [9,14,15,16].  The  direct  integration 
approach  has  been  carried  out  generally  in  those  circumstances  in  which  I(x,  y)  and 
its  derivatives  can  be  determined  for  all  x and  y,  i.e.,  for  a spatially  unquantized, 
continuous-tone  image.  The  method  used  is  the  standard  technique  of  characteristic 
strips  for  solving  a first-order  hyperbolic  partial  differential  equation  [6,7].  Starting 
with  a point  at  which  the  surface  orientation  is  known,  integration  moves  along  a 
curve  in  the  image.  This  curve  is  dictated  by  the  image.  Adjacent  curves  generally 
are  not  ‘parallel’,  which  makes  it  difficult  to  get  complete  coverage  of  the  image. 
Interpolation  between  these  curves  — or  strips,  as  they  are  usually  called  — to 
find  initial  values  to  commence  an  intervening  strip  integration,  involves  complex 
procedures.  As  far  as  digital  images  are  concerned,  direct  integration  would  be  hard 
to  organize,  even  if  we  were  first  to  model  the  intensities  to  obtain  a continuous 
form  for  I(x,  y). 

As  is  the  case  with  most  partial  differential  equations,  it  should  be  noted  that 
the  image  irradiance  equation  has  many  solutions  [4].  The  boundary  conditions  (in 
the  above  method  the  initial  values  for  a strip)  are  vital  in  selecting  the  solution 
that  describes  the  surface  in  the  image.  Should  the  image  irradiance  equation  be 
‘underconstrained’  in  the  sense  that,  for  a given  I{x,  y),  it  admits  solutions  that 
encompass  a wide  range  of  surface  types  with  similar  boundary  values,  we  might 
then  expect  numerical  error  to  defeat  attempts  at  numerical  integration.  In  such 
cases,  errors  ‘mix  in’  other  solutions  that  can  eventually  dominate  the  recovered 
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solution,  even  though  they  may  be  excluded  by  the  boundary  conditions.  The 
method  of  direct  integration  has  been  demonstrated  on  simple  images  [6].  These 
examples  required  only  a small  number  of  integration  steps.  Numerical  instability 
has  also  been  reported  [4]. 

The  other  approach  used  to  solve  the  image  irradiance  equation  is  relaxation. 
Relaxation  procedures  avoid  numerical  instability,  but  face  the  problem  of  conver- 
gence. However,  they  do  have  the  advantage  of  being  directly  applicable  to  digital 
images,  i.e.,  spatially  quantized,  discrete-tone  images.  The  relaxation  (or  iterative) 
approach  views  the  image  irradiance  equation  not  as  a differential  equation,  but  as 
an  algebraic  constraint.  For  pixel  (i,  /), 

Ii,3  = 9»,j)  i 

where  I{j  is  the  image  irradiance  for  the  (*,  J*)th  pixel,  and  p.-^and  q;j  specify 
the  surface  orientation  of  the  surface  patch  that  is  imaged  at  pixel  (»',/).  As  an 
algebraic  constraint,  the  image  irradiance  equation  relates  image  irradiance  to  the 
two  surface  orientation  variables,  pt^and  qi,j.  In  viewing  the  image  irradiance 
equation  as  a algebraic  constraint,  we  lose  the  interrelationship  of  p,;y,  qij,  and 
their  neighboring  values,  a relationship  inherent  in  the  differential  equation.  To 
compensate  for  this  loss,  an  additional  constraint  must  be  introduced  that  relates 
Pij, and  Qij  to  their  neighboring  values.  Such  a relationship  is  essential  for  a 
relaxation  procedure.  The  relationship  usually  introduced  attempts  to  capture  the 
notion  of  surface  smoothness  [2,9,  12,14],  The  particular  form  of  the  smoothness 
constraint  may,  for  example,  require  that  pt-;y,and  qij  be  equal  to  the  mean  values 
of  neighboring  p’s  and  q's.  For  any  trial  values  for  p,-j, and  qij,  the  constraint 
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imposed  by  the  image  irradiance  equation  and  the  constraint  resulting  from  surface 
smoothness  will  not  be  completely  satisfied.  The  residual  equation  formed  from 
each  constraint  specifies  how  well  that  constraint  is  satisfied.  If  is  the  sum  of 
the  [absolute  values  of  the]  residuals  from  both  the  image  irradiance  constraint  and 
the  surface  smoothness  constraint  for  the  (»’,/)th  pixel,  then,  for  trial  values  of  p 
and  q for  every  image  pixel,  the  total  residual  error  is 

£ = 53  * 

i,j£image 

The  allocation  of  surface  orientations  to  all  pixels  should  minimize  this  total  error 
— that  is, 

r\  £ 

=0  V *,/  £ image  , 

°Pi,i 

f)  £ 

= 0 V t,  y e image  . 

OQi.j 

From  these  equations  we  obtain  an  iterative  scheme  for  updating  the  values  of  p 
and  q so  that  they  are  compatible  with  their  neighboring  values,  as  well  as  with 
the  image  irradiance  equation  [9,12,14].  If  such  a scheme  is  convergent,  we  have  a 
procedure  for  obtaining  shape  from  shading. 

It  should  be  noted  that  the  relaxation  schemes,  that  use  the  foregoing  approach 
are  possible  only  because  the  smoothness  constraint  relates  the  values  at  one  pixel 
to  those  of  its  neighboring  pixels.  The  boundary  conditions  needed  for  selecting  a 
particular  solution  from  the  solution  set  of  the  iterative  scheme  are  propagated  by 
the  smoothness  constraint,  not  the  image  irradiance  equation.  Compared  with  the 
direct-integration  approach,  information  propagation  in  the  relaxation  scheme  uses 
a different  mechanism.  We  must  remember  this  when  we  assess  results. 

Success  with  these  methods  has  generally  been  limited  to  small  images,  (usually 
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fewer  than  30  x 30  pixels),  of  nearly  spherical  or  saddle  surfaces  [3,9,12,14],  For  an 
effective  relaxation  scheme,  the  initial  solution  should  have  no  effect  on  the  surface 
recovered.  This  unfortunately  is  not  the  case  [12],  Boundary  conditions  are  not 
propagated  more  than  a few  pixels  by  the  smoothness  constraints  [12,14],  Surface 
recovery  from  large  images,  (bigger  than  30  x 30  pixels),  is  ineffectual  for  this  reason. 
As  a consequence  of  the  fact  that  smoothness  is  used  as  information  propagator, 
assumptions  (albeit  weak  ones)  have  been  made  about  surface  shape.  Shading  as  a 
constraint,  and  smoothness  as  a surface  type,  appears  insufficient  to  provide  a basis 
for  an  effective  shape-from-shading  algorithm. 

2.2  Pentland 

The  approaches  to  solving  the  shape-from-shading  task  that  we  have  discussed 
so  far  have  all  been  based  on  constraint  propagation.  Direct  integration  is  a 
spatially  serial  solution  to  the  propagation  problem,  while  relaxation  attempts  to 
achieve  this  propagation  with  a temporally  serial  solution;  in  other  words,  relaxation 
employs  local  processing,  but  it  must  iterate  until  enough  cycles  have  passed  to  allow 
information  to  propagate  spatially.  Purely  local  computation  of  scene  parameters, 
on  the  other  hand,  is  not  a propagation  method.  While  this  kind  of  computation 
can  use  neighboring  data  — and  not  just  of  the  nearest  neighbors  — it  must  provide 
an  instant  solution.  It  cannot  iterate  and  therefore  it  does  not  provide  a temporally 
serial  solution.  Such  an  approach  to  scene  parameter  computation  avoids  the 
numerical  instability  of  direct  integration  methods,  as  well  as  the  convergence 
and  propagation  problems  of  relaxation,  but  it  cannot  use  spatially  distant  scene 
information.  A local  computation  can  use  global  information,  such  as  the  position 
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of  the  light  source,  but  it  cannot  use  scene  details,  such  as  the  position  of  a distant 
edge.  Of  course,  the  reason  for  carrying  out  purely  local  computation  stems  from  the 
hypothesis  that  such  scene  detail  is  not  involved  in  the  computation  at  this  level  in 
the  visual  system.  Can  shading  provide  sufficient  local  information  to  allow  recovery 
of  surface  shape  by  purely  local  computation?  This  is  the  question  addressed  by 
Pentland  [11]. 

The  inadequacy  of  local  image  measurements  for  specifying  surface  orienta- 
tion can  be  understood  by  counting  the  variables  needed  to  specify  various  image 
measurements.  Let  us  consider  the  case  of  a uniformly  diffuse  reflecting  surface. 
Image  irradiance  (1  measurement)  is  a function  of  surface  orientation  (2  parameters), 
the  product  of  surface  albedo  and  illumination  strength  (1  parameter),  and  the 
position  of  the  light  source  (2  parameters).  The  gradients  of  image  irradiance  (2 
measurements)  are  functions  of  the  same  variables  as  image  irradiance  and,  ad- 
ditionally, are  functions  of  surface  curvature  (3  parameters).  The  second  deriva- 
tives of  image  irradiance  (3  measurements)  are  functions  of  all  the  variables  men- 
tioned above,  plus  the  rates  of  change  of  curvature  (4  parameters).  Because  higher 
image-irradiance  derivatives  introduce  more  surface  shape  derivatives,  we  have  more 
parameters  than  measurements.  It  should  be  noted  that  a knowledge  of  global 
quantities,  such  as  the  illumination  position  and  the  product  of  surface  albedo  and 
illumination  strength,  is  not  sufficient  to  allow  the  surface  orientation  to  be  com- 
puted locally.  If  we  make  assumptions  about  the  relationship  among  some  of  the 
above  parameters,  we  can  produce  a system  of  equations  from  which  surface  orien- 
tation can  be  calculated. 


Pentland  investigates  the  case  in  which  an  image  patch  of  a uniformly  diffuse 
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reflecting  surface  can  be  considered  identical  to  a point  on  an  illuminated  sphere 
whose  reflection  is  also  uniformly  diffuse  [11].  He  calculates  the  orientation  of  the 
surface  patch  on  the  sphere  that  has  the  same  appearance  as  the  surface  patch  in 
the  image.  Not  all  image  patches  can  be  represented  by  points  on  an  illuminated 
sphere.  Spheres  whose  reflection  is  uniformly  diffuse  have  the  property 

^>0  . 

where  subscripts  denote  partial  differentation  with  respect  to  those  subscripts. 
There  are  surfaces,  e.g.,  a sinusoidal  surface,  for  which  can  be  negative.  The 
procedure  for  estimating  surface  orientation  that  is  based  on  the  assumption  that 
surfaces  can  be  approximated  by  locally  spherical  patches  is  applicable  only  to 
parts  of  an  image.  Notwithstanding  these  restrictions,  an  important  aspect  of  the 
assumption  of  local  sphericity  is  that  the  surface  orientation  is  calculated  by  using 
the  second  derivatives  of  image  irradiance  only,  i.e., 

1 — m2  _ I xx 
lm  Ixy 

1 - P 

Irn  Ixy 

These  equations  are  derived  by  differentiating  the  image  irradiance  equation  and 
noting  that,  for  a sphere,  lx  = £,  ly  = 0,  mx  = 0,  and  my  = £,  where  r is  the 
sphere’s  radius. 

In  this  model,  surface  orientation  is  directly  dependent  on  neither  image  ir- 
radiance nor  on'the  first  derivatives  of  image  irradiance.  It  may  be  estimated  even  in 
images  that  exhibit  linear  changes  in  irradiance  induced  by  artifacts,  and  in  images 
that  exhibit  constant  illumination  levels  induced  by  atmospheric  effects,  such  as 


backscatter.  More  importantly,  the  formulas  are  independent  of  the  illumination 
parameters  and  the  surface  albedo.  In  exchange  for  acceptance  of  a restrictive  as- 
sumption with  respect  to  surface  type,  one  acquires  not  only  a means  of  calculating 
surface,  orientation,  but  a procedure  that  needs  no  information  other  than  image 
measurements  — a procedure,  in  effect,  that  is  matched  to  the  notion  of  intrinsic 
images. 

Even  in  those  areas  of  an  image  to  which  this  approximation  can  be  applied,  the 
assumption  that  a surface  can  be  approximated  by  a patch  with  the  same  curvature 
in  any  direction  needs  experimental  verification.  The  world  is  obviously  not  com- 
posed of  such  surfaces,  but  it  is  the  difference  between  the  estimated  and  the  actual 
surface  orientation  that  is  more  important  than  the  error  made  in  approximating 
the  surface  by  a spherical  patch.  Application  of  the  above  formula  yields  qualita- 
tive agreement  between  the  estimated  and  actual  shape  in  synthetic  images  and  in 
natural  images  of  simple  objects  [11],  (for  which  is  generally  positive).  Shape 
estimates  from  synthetic  images  of  ellipsoidal  surfaces  are  ‘flatter’  than  the  actual 
shapes.  It  should  be  noted  that  shape  estimates,  which  are  integrated  surface  orien- 
tations, often  appear  ‘better’  than  what  might  be  expected  on  the  basis  of  the 
surface  orientation  error.  An  algorithm  based  on  approximating  a surface  patch  by 
a spherical  one  seems  better  suited  for  computing  the  qualitative  shape  of  a surface 
than  the  orientation  of  surface  elements.  Such  an  algorithm  is  applicable  only  to 
thoses  image  patches  that  are  consistent  with  the  interpretation  of  such  patches  as 
points  on  a sphere.  The  conditions  necessary  for  enabling  this  kind  of  interpretation 
have  not  been  fully  characterized.  Alternative  models,  that  are  applicable  when  an 
image  patch  is  inconsistent  with  an  interpretation  that  it  is  a point  on  a sphere,  are 
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currently  unknown. 

In  principle,  because  image  irradiance  is  not  differentiable  at  boundaries,  we 
cannot  apply  the  above  method  there.  However,  unlike  propagation  methods  re- 
quire our  knowing  boundary  positions  in  order  to  stop  computation,  the  local- 
computation  approach  may  accomplish  this  simply  by  indicating  (through  its  failure 
at  a boundary)  where  the  boundary  is. 

Pentland’s  approach  hinges  on  the  local-sphericity  assumption.  In  restricted 
circumstances  he  is  able  to  estimate  surface  orientation  directly  from  the  second 
derivatives  of  the  image  irradiance.  What  other,  perhaps  less  specific,  assumptions 
can  be  made  that  allow  shape  to  be  estimated  locally?  Before  attempting  to  answer 
this,  we  review  the  shape-from-shading  formulation  we  have  previously  proposed 
[12,13],  — first,  to  assess  its  performance,  then  to  provide  the  requisite  analytical 
tools  for  answering  questions  about  local  computation. 

2.3  Smith 

The  approach  taken  by  Horn  and  his  colleagues  provides  a formulation  of  the 
shape-from-shading  task  that  requires  knowledge  of  scene  parameters,  but  places 
no  restriction  on  the  surface  shape.  Calculation  of  surface  orientation  is  not  a 
local  process,  and,  if  surface  orientation  is  to  be  recovered,  knowledge  of  boundary 
conditions  is  necessary.  Pentland,  on  the  other  hand,  restricts  the  surface  shape 
but  requires  no  scene  parameters,  no  boundary  conditions,  and  derives  surface 
orientation  by  purely  local  computation.  Is  there  an  intermediate  position?  Is 
there  a formulation  that  neither  restricts  the  surface  shape  nor  requires  knowledge 
of  scene  parameters?  Of  course,  local  computation  seems  desirable  — but  is  it 
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worth  the  concomitant  cost  of  surface  type  restriction  or  the  requirement  that  scene 
parameters  be  known  a priori?  The  formulation  previously  described  by  us,  takes 
such  an  intermediate  position. 

For  a uniformly  diffuse  reflecting  surface,  surface  orientation  is  related  to  image 
irradiance  by  the  second-order  partial  differential  equations  [12] 

aOlxx  + (30mxx  - a^lxy  - 0^mxy  = x^xx  - Xlhy  , 

otOlyy  P0myy  a5lxy  /36mxy  ===  X®^yy  X^^xy  > 

where 

a = Ixmy  - Iymx  , 

0 = ly  lx  Ixly  i 

7 = /r2(l  — m2)  + mx2(l  — l2)  + 2lxmxlm  , 

5 = ly2(l-m2)  + my2{l-l2)  + 2lymylm  , 

0 = lxly(  1 — m2)  + mxmy(  1 — l2)  + ( lxmy  + lymx)lm  , 

X = lxmy  — lymx  . 

These  equations  are  derived  from  the  image  irradiance  equation.  The  assumption  of 
uniformly  diffuse  reflection  relates  some  of  the  scene  parameters,  thereby  allowing 
elimination  of  parameters  that  specify  surface  albedo  and  illumination  conditions. 

The  assumption  that  surface  reflection  is  uniformly  diffuse  is  an  assumption 
about  the  physics  of  image  formation.  While  it  does  not  describe  the  reflectance 
properties  of  all  surface,  it  is  a reasonable  approximation  to  most  surfaces  that  are 
encountered  in  the  natural  world.  For  any  formulation  of  the  relationship  between 
shading  and  shape,  some  assumptions  are  necessary.  Those  describing  properties 
found  in  nature  are  more  palatable  than  restrictions  for  which  little  a priori  evidence 
is  available. 

A desirable  aspect  of  this  formulation  is  that  surface  orientation  is  not  related  to 
image  irradiance,  but  only  to  its  derivatives.  The  existance  of  constant  illumination 
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levels,  from  atmospheric  scattering  or  fogging  of  photographic  images,  does  not 
impede  the  potential  for  shape  recovery.  Linear  changes  in  intensity,  however, 
must  affect  the  shape  of  any  recovered  surface.  A more  important  aspect  of  this 
formulation  is  its  independence  of  surface  albedo.  Again  we  reiterate  that,  if 
the  notion  of  intrinsic  images  is  to  be  useful  we  must  find  models  that  decouple 
surface  shape  from  surface  reflectance.  The  fact  that  knowledge  of  the  illumination 
conditions  is  not  required,  is  certainly  an  important  aspect,  but  less  so  than  the 
formulation’s  independence  of  surface  albedo. 

The  penalty  for  not  making  assumptions  about  surface  type  and  for  not  presup- 
posing any  knowledge  of  scene  parameters,  such  as  illumination  conditions  and  sur- 
face albedo,  is  the  introduction  of  higher-order  derivatives  of  surface  orientation  in 
the  formulation,  as  well  as  the  inability  to  calculate  surface  orientation  by  purely 
local  computation.  Boundary  conditions  are  necessary.  To  formulate  a model  that 
relates  surface  orientation  to  image  irradiance  is  one  thing;  to  solve  it  for  that 
orientation  is  another. 

The  second-order  partial  differential  equations  (given  above)  relating  surface 
orientation  and  image  irradiance  are  satisfied  by  solutions  to  the  first-order  partial 
differential  equation  \ ==  0-  This  is  undesirable,  as  solutions  of  x = 0 satisfy  the 
surface-orientation-image-irradiance  equations  independently  of  the  image  measure- 
ments, IxJyJxx,  lyy,  and  Ixv.  The  equation  x — 0 characterizes  the  developable 
surfaces,  e.g.,  a cylinder  or  cone  (see  Appendix  B);  derivation  of  the  above  surface- 
orientation-image-irradiance  equations  is  impossible  when  the  surface  is  develop- 
able, i.e.,  singularly  curved.  The  surface-orientation-image-irradiance  equations  are 
appropiate  only  when  the  surface  is  doubly  curved.  For  singularly  curved  surfaces, 
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the  appropiate  equations  relating  surface  orientation  and  image  irradiance  are 

Ix2(rny(l  — l2)  + lylm ) = Iy2(lx(  1 — m2)  + mxlm)  , 

lx  m y IytTlx  0 ) 

Ixly  IylX  = 0 

(These  equations  are  derived  independently  of  any  reflection  function,  i.e.,  they 
apply  to  all  types  of  reflection,  not  just  uniformly  diffuse  reflection.  See  Appendix 
C.) 

If  the  surface-orientation-image-irradiance  equations  were  solved  by  analytic 
procedures,  the  problems  posed  by  the  X = 0 solutions  would  vanish,  as  such 
solutions  would  be  ruled  out  by  boundary  conditions.  However,  the  presence  of 
such  solutions  heralds  difficulties  for  numerical  methods,  as  the  inevitable  numeri- 
cal errors  will  mix  these  solutions  into  the  recovered  surface  orientations.  Two 
approaches  to  solving  the  surface-orientation-image-irradiance  equations  have  been 
reported  [13].  These  approaches  are  direct  integration,  which  is  implemented  by 
finite-difference  formulas,  and  relaxation.  Both  require  additional  information  in 
the  form  of  boundary  conditions.  Both  fail  to  recover  surface  orientation.  Direct  in- 
tegration correctly  recovers  the  surface  orientation  in  the  vincinity  of  the  boundary 
conditions,  but  is  ineffective  elsewhere.  The  reasons  for  failure  of  each  method  are 
of  interest;  direct  integration  fails  because  numerical  instability  makes  the  spatially 
serial  method  of  solution  impractical;  relaxation  fails  because  nonconvergence  makes  . 
the  temporally  serial  method  of  solution  infeasible.  These  direct  reasons  for  failure 
mask  a deeper  problem.  The  model  is  ‘underconstrained’  from  the  standpoint  that 
the  equations  are  insensitive  to  surface  orientation.  They  are  more  sensitive  to  other 
surface  parameters,  such  as  surface  curvature  [13].  Underconstraint  of  the  model 
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can  account  for  lack  of  convergence  of  relaxation  methods,  but  the  numerical  prob- 
lems in  direct  integration  highlight  the  difficulty  of  spatial  information  propagation 
by  a mechanism  that  is  under  the  control  of  higher-order  derivatives. 

The  surface-orientation-image-irradiance  equations  alone  do  not  form  the  basis 
for  an  algorithm  to  recover  surface  orientation;  they  do  provide  a tool,  however, 
for  examining  the  constraint  shading  imposes  on  shape.  We  shall  subsequently  use 
them  for  that  purpose. 

3.  Local  Computation  Versus  Global- Constraint 
Propagation 

What  can  we  learn  from  these  various  approaches  to  shape-from-shading? 
Direct  integration  of  a differential  model  is  an  inadequate  computational  tool.  Horn 
and  his  colleagues,  using  a low-order  partial  differential  equation,  show  that  some 
propagation  of  information  is  possible  — but  numerical  instability  poses  a difficulty 
even  for  a first-order  equation.  This  limited  success  with  direct  integration  is 
unlikely  to  be  upgradable  to  a solution  procedure  for  natural  scenes.  Since  higher- 
order  formulations  are  plagued  with  numerical  instability  they  do  not  offer  any 
prospect  of  success. 

A restricting  factor  in  a differential  model  is  the  need  for  knowledge  of  boundary 
conditions.  This  seems  to  be  a major  limitation  of  such  methods.  These  methods 
apply  to  continuous  surface  patches  only  and  require  a priori  knowledge  of  solution 
values  at  some  points  within  every  region.  This  means  that  we  must  find  regional 
boundaries  — perhaps  ascertain  their  type  and  estimate  values  of  surface  orientation 
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at  some  points  within  each  region  before  we  can  attempt  to  recover  shape.  Is  this, 
in  effect,  placing  the  cart  before  the  horse? 

Models  of  the  relationship  between  image  measurements  and  scene  variables 
that  are  formulated  as  low-order  differential  equations  offer  no  relief  from  the  neces- 
sity of  knowing  scene  parameters.  While  information  about  illumination  conditions 
may  be  obtainable  from  other  sources  within  the  image,  or  maybe  calculated  in 
parallel  with  shape,  it  is  difficult  to  envisage  a situation  in  which  the  surface  albedo 
could  be  calculated  before  the  surface  shape.  Albedo  would  seem  less  constrained 
than  shape.  The  author’s  higher-order  differential  equations  show  that  derivatives 
of  image  irradiance  can  be  used  to  remove  these  parameters. 

While  the  relaxation  schemes  used  to  solve  the  image  irradiance  equation 
are  not  quite  viable,  their  drawbacks  may  be  attributed  to  the  weakness  of  the 
surface  shape  constraint,  namely  smoothness,  rather  than  an  inherent  deficiency  of 
relaxation  as  a technique.  For  the  higher-order  surface-orientation-image-irradiance 
equations,  insensitiveness  to  surface  orientation  does  not  allow  assessment  of  the 
strength  of  surface  continuity  (the  constraint  used  in  the  attempts  to  solve  these 
equations  by  relaxation).  The  results  reported  from  these  relaxation  procedures 
can  be  attributed  to  other  aspects  of  the  models  they  embody,  rather  than  to  any 
deficiency  of  the  relaxation  technique  itself.  Relaxation  seems  viable  as  a method 
that  can  satisfy  global  constraints  without  being  dominated  by  numerical  error. 
However,  surface  shape  assumptions,  that  are  more  restrictive  than  those  used  in 
the  work  reviewed,  appear  necessary  if  information  is  to  be  propagated  effectively 
over  reasonable  image  distances.  Relaxation  schemes  that  implement  low-order 
differential  models  seem  practicable;  schemes  implementing  higher-order  differential 
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models  are  too  sensitive  to  noise. 

In  comparison  with  information  propagation  techniques,  local  computation  of 
surface  orientation,  as  reported  by  Pentland,  requires  strong  restrictions  on  surface 
shape  — and  even  these  are  not  adequate  to  characterize  all  cases.  However,  local 
computation,  particularly  when  it  is  based  on  a model  involving  derivatives  of  image 
irradiance  only,  does  provide  a means  for  recovering  surface  orientation  without  any 
knowledge  of  boundary  conditions,  without  a priori  regional  segmentation  (it  may 
even  help  in  this  endeavor),  and  without  knowing  the  scene  parameters,  especially 
albedo.  Unfortunately,  we  shall  not  get  a solution  to  surface  orientation  that  is 
quantitively  correct  because  the  surface  restriction  is  too  great.  Local  computation 
offers  the  computational  features  we  want,  but  the  penalty  to  be  paid  — severe 
surface  shape  restriction  — is  far  too  great. 

What,  then,  seems  practical?  A relaxation  scheme  that  is  more  constrained 
by  surface  type  than  those  that  have  been  examined?  A scheme  that  implements 
a low-order  model  of  information  propagation?  A scheme  that  does  a lot  of  purely 
local  computation?  A scheme  that  can  use  boundary  conditions  wherever  they  are, 
but  without  being  overly  dependent  on  them?  Of  course,  all  this  is  one  conjec- 
ture. There  may  well  be  a group  of  models  that  provide  purely  local  computation, 
along  with  a means  of  determining  when  each  model  is  applicable.  Higher-order 
differential  models,  however,  or  low-order  differential  models  that  require  too  much 
a priori  scene  knowledge  do  not  appear  practicable.  For  any  realistic  model  it  seems 
inevitable  that  local  processing  must  play  an  important  role.  Consequently,  what 
can  we  compute  locally  from  the  shading  data?  This  is  the  question  we  shall  now 
address. 
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4.  Analysis  of  Local  Computation 

The  relationship  between  surface  orientation  and  image  irradiance  for  a 
uniformly  diffuse  reflecting  surface  that  is  doubly  curved  is  given  by  the  surface- 
orientation-image-irradiance  equations  of  Section  2.3.  Parameter  counting  reveals 
that  local  image  measurements  are  insufficient  to  specify  surface  orientation  for  the 
general  case,  but  shape  constraints  can  overcome  these  degrees  of  freedom.  Since  we 
wish  to  calculate  surface  shape  locally,  we  consider  the  case  in  which  we  can  assume 
a constant  curvature  over  the  small  surface  patch  from  which  we  draw  information 
for  the  local  calculation.  Of  course  the  curvature  varies  with  direction;  we  only 
assume  that  we  can  ignore  any  change  in  curvature  over  the  surface  patch.  Of 
course,  this  assumption  is  not  valid  in  general;  we  are  restricting  our  attention  to 
this  case  to  simplify  the  analysis.  If  we  cannot  determine  what  shape  information 
is  available  in  this  restricted  case,  we  are  not  likely  to  understand  the  general  case. 
For  this  case,  when  we  ignore  curvature  change,  lxx  = lyy  — lxy  = mxx  — myy  — 
mxy  = 0,  and  from  the  surface-orientation-image-irradiance  equations  we  derive 
the  expressions 

lxx lx2(l  - rrr2)  + mz2(l  — l2)  + 2 lxmxlm 

lxy  /x/y(l  - m2)  + mxmy{\  - l2)  + {lxmy  + lymx)lm  ' 
ky  _ /y2(l  - m2)  + my2[  1 - l2)  + 2 lymylm 

Ixy  lxly{l  - m2)  + mxmv{l  - 12)  + {lxmy  + lymx)lm 

\ 

Notice  that  these  relationships  are  only  between  surface  shape  and  the  second 
derivatives  of  the  image  irradiance.  It  is  the  assumption  of  constant  curvature, 
not  the  more  restrictive  sphericity  assumption  (used  by  Pentland  to  recover  sur- 
face orientation  from  the  second  derivatives  of  image  irradiance),  that  is  necessary 
to  relate  shape  and  just  the  second  derivatives  of  the  image  irradiance.  Image 
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measurements  are  generally  dependent  on  scene  parameters  other  than  those  en- 
coding shape.  The  first  and  second  derivatives  of  image  irradiance  depend  on  the 
lighting  position  and  the  surface  albedo,  but  the  ratios  of  second  derivatives  are 
independent  of  all  scene  parameters  except  surface  shape. 

Can  we  use  the  above  expressions  to  calculate  surface  orientation?  We  have 
previously  [13]  pointed  to  the  insensitivity  of  surface-orientation-image-irradiance 
equations  to  surface  orientation.  The  above  expressions  are  also  insensitive  to  sur- 
face orientation.  We  see  this  in  the  following  considerations.  Algebraic  manipulation 
of  the  above  expressions  yields 

IXx  _ lx2  + ™x2  - ( lx m — lmx )2 
hv  l2  + my2  — (lym  - lmy)2 

Suppose  that  over  an  image  patch  we  know  values  of  / and  m that  satisfy  the  above 
expression.  Consider  now  this  expression  for  when 

l'  = u>il  and  m!  — w2m  , 

at  each  point  of  the  image  patch.  Using  finite-difference  formulas  to  calculate  the 
derivatives  of  the  surface  normal,  we  obtain 

I'xx  _ l'2  + m'2-{l'xm'-l'm'x)2 
Vvv  I'y 2 + m 2 - (l'ym'  - l'm’y)2 

_ Wi2lx2  + W22mx2  — w12w22(lx™  — lmxf 
Wi  2ly2  + •w22my2  — wl2w22(lym  — lmy)2 

Note  that,  as  the  magnitude  of  or  tu2  is  varied,  the  numerator  and  denominator  of 

J*  ( 

jf*-  vary  in  like  manner;  both  either  increase  or  decrease;  7 f*  remains  approximately 

iyy  iyy 

equal  to  j**-.  The  ratios  of  the  second  derivatives  of  image  irradiance  are  not 
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sensitive  to  surface  orientation.  We  cannot  get  further  shape  information  from 
other  image  measurements,  as  the  first  and  second  derivatives  of  image  irradiance 
are  dependent  on  the  surface  albedo  and  the  lighting  conditions,  and  the  image 
irradiance  is  dependent  on  surface  albedo,  lighting  conditions,  and  the  level  of 
constant  illumination  from  such  sources  as  atmospheric  scatter  and  the  dark  current 
of  the  sensor.  Surface  orientation  can  be  computed  locally  only  when  very  restrictive 
assumptions  about  surface  shape  are  made.  Without  such  restrictions  there  is  not 
enough  information  in  the  shading  to  decouple  surface  orientation  effects  from  those 
of  albedo  and  illumination. 

If  shading  is  insufficient  to  allow  surface  orientation  to  be  recovered,  what  then 
does  the  shading  specify?  Does  it  specify  curvature?  Can  we  compute  it  locally? 
Consider  the  above  expressions  for  7^,  and  7^.  Suppose  that  we  know  the  correct 
values  for  / and  m at  an  image  point  and  we  want  to  calculate  lx,ly,mx,  and  my. 
If  lx,ly,mx,  and  my  is  a solution,  then  so  is  wlx,wly,wmx,  and  wmy,  where  w 
is  any  constant.  Curvature  cannot  be  computed  locally  (without  further  shape 
assumptions).  The  ratios  of  second  derivatives  of  image  irradiance  contain  shape 
information,  yet  are  insensitive  to  surface  orientation  and  do  not  allow  computation 
of  the  curvature.  What  information  about  the  surface  do  they  encode? 

To  answer  this  question,  we  first  rewrite  the  expressions  for  7^  and  7^  in 
vector  dot  product  form: 

Ixx  [^(/,  m,  m,  x/1  - _ ^3)] 

Ixy  m>  \/l-/2  -m2)].[^(/,  rn,  s/1- 12-  m2)] 

Iy y _ ™2)].[^(/,  m,  y/1  - /2  - m2)] 

Ixy  l^(*.  Vl-l2-  m,  x/l-l2-  m2)] 


Using  the  notation  N = (/,  m,  \/l  — l2  — m2),  for  the  unit  surface  normal,  we  obtain 
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Ixx  Na.NX 

Ixv  ~ Nx.N y ’ 
luv  = Ntf.Ng 

Ixy  Nx.Ny  ’ 

where  Nx  = ^ and  Ny  = 

For  the  case  studied  — when  curvature  changes  are  ignored  — the  ratios  of 
the  second  derivatives  of  image  irradiance  measure  the  relative  squared  curvature 
of  the  surface.  In  other  words,  the  ratios  measure  the  relative  change  of  the  surface 
normal  as  we  move  along  orthogonal  image  directions.  However,  relative  curvature 
calculated  locally  at  each  image  point  constitutes  insufficient  information  to  allow 
surface  shape  reconstruction  in  the  absence  of  further  information  about  surface 
parameters.  From  shading  information  alone  shape  is  an  unattainable  goal. 

If  we  can  find  surface  shapes,  however,  for  which  knowledge  of  relative  cur- 
vature implies  stronger  information  about  the  surface,  e.g.,  surface  orientation  as 
in  the  case  of  a sphere,  and  if  these  surface  shapes  are  reasonable  approximations 
of  the  surfaces  found  in  nature,  then  we  may  be  able  to  recover  stronger  shape 
information  locally.  Locally  there  is  not  enough  information  to  calculate  surface 
shape  without  further  knowledge,  or  without  additional  assumptions  about  surface 
shape.  Pentland’s  work  shows  that  an  assumption  of  sphericity  is  strong  enough  to 
allow  surface  orientation  to  be  calculated  locally.  Is  this  ability  to  calculate  surface 
orientation  specifically  related  to  sphericity  — or  is  it  a feature  that  is  generally  true 
when  we  restrict  the  surface  shape  to  cases  in  which  the  number  of  free  parameters 
is  no  more  than  that  for  a spherical  surface?  In  the  foregoing  discussion  we  have 
assumed  that  the  surface  is  doubly  curved.  We  shall  now  consider  the  images  of 
singularly  curved  surfaces. 
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Just  as  we  did  for  doubly  curved  surfaces,  we  assume  that  the  derivatives  of 
surface  curvature  can  be  ignored  when  we  consider  local  computation  of  surface 
parameters.  Differentiating  the  image  irradiance  equation,  we  obtain  the  same 
expression  as  before  for  the  doubly  curved  surface,  namely, 


Ixx  lx2  + mx2  - ( lxm  - lmx)2 

hv  ly2  + my2  - (lym  - lmy)2 

For  a singularly  curved  surface  (lxmy  — lym x — 0)  when  surface  curvature  is  locally 
constant,  the  second  derivatives  of  image  irradiance  are  not  independent,  IxxIyy  ~ 
Ixy 2 . Consequently,  we  can  derive  only  one  expression  relating  shape  and  the  second 
derivatives  of  image  irradiance,  rather  than  the  two  expressions  we  derived  for 
doubly  curved  surfaces.  As  before,  it  follows  that 

1 'xx 

lyy  Ny.Ny 

At  first,  it  might  appear  that  there  is  more  shape  information  in  the  first  derivatives 
of  image  irradiance  for 

I X m y Iy  mx  0 , 

Ix  ly  fj/ij  — 0 

But  this  is  not  the  case,  as  the  first  and  second  derivatives  of  image  irradiance  are 
not  independent.  For  singularly  curved  surfaces,  when  we  ignore  curvature  change, 


For  the  singularly  curved  and  doubly  curved  surfaces  studied,  local  shading 
specifies  the  relative  curvature  of  the  surface  along  orthogonal  image  coordinates, 
which  is  the  most  we  can  hope  to  recover  by  local  computation.  In  general,  we 
cannot  ignore  curvature  change  over  a patch.  In  this  case,  the  information  available 
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locally  in  the  image  combines  data  on  relative  curvature  and  curvature  change.  In 
the  restricted  case  in  which  the  the  surface  is  assumed  to  be  spherical  the  surface 
orientation  can  be  calculated.  However,  this  appears  to  be  a very  special  situation 
based  on  the  sphericity  assumption  rather  than  on  a restriction  in  the  number  of 
parameters  needed  to  specify  the  surface.  Since  surfaces  in  general  are  not  locally 
spherical,  one  is  forced  to  conclude  that  shading  alone  cannot  enable  prediction  of 
surface  shape  by  purely  local  computation. 

5.  Conclusions 

The  recovery  of  a scene’s  surface  shape  from  its  image  is  fundamental  to 
the  vision  process.  Our  purpose  in  processing  an  image  is  the  recovery  of  scene 
properties,  not  those  of  the  image  per  se.  In  remote  sensing  it  is  these  scene 
properties  that  we  wish  to  measure,  but,  to  extract  them,  we  have  to  understand 
how  these  scene  properties  are  manifested  in  the  image  data.  A conceptual  model  of 
the  relationship  between  scene  and  image  parameters  is  provided  by  intrinsic  images. 
Each  intrinsic  image  specifies,  for  each  point  in  the  image,  the  value  of  one  of  the 
scene  parameters  that  contribute  to  the  measured  image  intensity.  Vision  models 
try  to  recover  these  parameters  as  best  they  can,  whereupon  a type  of  relaxation 
process  adjusts  their  values  so  that  they  constitute  a consistent  interpretation  of  the 
scene’s  structure.  Which  parameters  are  specified  by  separate  intrinsic  images  and 
which  are  composite  is  unknown,  but  it  is  essential  that  they  be  estimable  without 
the  need  to  know  the  values  of  the  other  intrinsic  images.  Shape-from-shading 
proposes  a source  of  information,  namely  shading,  from  which  shape  information  is 


571 


to  be  recovered  — but  what  shape  information  does  it  actually  encode? 

Local  shading  specifies  no  more  than  the  relative  curvature  of  the  scene’s 
surface  along  orthogonal  image  directions.  In  general,  even  the  recovery  of  relative 
curvature  is  complicated  by  change  in  the  curvature  of  the  surface.  However,  surface 
shape  variables  are  related  to  image  measurements  in  a fashion  that  is  not  dependent 
on  knowing  the  other  scene  parameters.  Shading  provides  direct  shape  information, 
but  this  is  not  enough  for  reconstruction  of  the  surface  shape.  Further  relationships 
between  shape  variables  and  image  properties  must  be  established  before  shape 
recovery  is  possible. 

The  various  approaches  reviewed  have  attempted  to  recover  surface  orientation 
from  shading.  To  do  so  they  have  added  extra  information,  such  as  known  boundary 
conditions  or  constraints  upon  surface  shape.  The  performance  of  these  various 
models  allows  us  to  draw  the  following  conclusions: 

• Direct  integration  of  differential  models  of  scene  properties  requires  much  a 
priori  information  and  has  to  contend  with  major  computational  problems. 

• Local  computation  must  play  a major  role  in  the  recovery  of  scene  parameters, 
but  the  models  used  have  been  overly  restrictive  in  an  effort  to  recover 
particular  information. 

• A relaxation  mechanism,  based  on  a strong  low-order  differential  model,  seems 
a viable  means  of  propagating  spatial  information  and  constraints. 

Shading  provides  a basis  for  an  intrinsic  image,  specifying  relative  surface 

curvature  and  curvature  change,  but  this  intrinsic  image  alone  is  insufficient  for 
surface  shape  recovery.  Other  models  incorporating  other  image  measurements  are 
needed  to  complement  shading.  Such  models  should  utilize  the  advantages  of  local 
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computation. 
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Appendix  A 


If  a surface  is  twice  differentiable,  then 

ly(l  — m2)  + mylm=mx(l-l2)  + lxlm  . 

We  call  this  the  surface  continuity  equation,  even  though  surface  continuity  is  less 
demanding  than  the  requirement  that  the  surface  be  twice  differentiable. 

Proof:  For  a continuous  twice-differential  surface, 

zxy  = Zyx 


But  p = zx  and  q = zy,  so 
However, 


Hence, 


P = 
q = 


Py  = 0* 
-/ 


\/l  — l2  — m‘ 
—m 


Py  = 

qx  = 


sfl-l2-  m2 

/y(I  — m2)  + mvlm 
(l_/2_m2)f 
mz(l  — l2)  + lxlm 


(1  — l2  — m2)a 
Then,  substituting  in  py  = qx  yields 

ly{l  — m2)  + mvlm  = mx(l  — l2)  + lxlm 


Appendix  B 

Developable  surfaces  are  characterized  by  the  differential  equation 

l X m y l y tnx  — 0 

Proof:  With  the  exception  of  a cylinder  whose  axis  is  parallel  to  the  z axis,  the 
differential  equation  defining  all  developable  surfaces  is  [5] 

ZXXZyy  Zxy  = 0 
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As  the  surface  is  twice  differentiable,  then  zxy  = zyx  so 


As  p — zx  and  q = zy  then 


But 


Hence, 


Zxxzyy  ZxyZyx  — 0 


PxQy  PyQx  — 0 

-l 


P — 


v/1  - P - 


—m 


y/l-P- 


m' 


Px  = 


lx{  1 ~ tri2)  + mxlm 
(l_/2_m2)f 
_ ly{  1 — m2)  + mylm 
Py~~  (i-/2_  m2)f 

_ mx{l  ~ *2)  + lxl*n 
^ X (1  — l2  — m2)2 

my(l  — /2)  4-  lvlm 
Qy  (1  - P - m2)f 
Substituting  in  pxqy  — pyqx  — 0 gives 

l X rUy  l y TTlx  0 . 


Appendix  C 

The  relationships  between  surface  orientation  and  image  irradiance  for  a de- 
velopable surface  are 

Ix2{my[l  - l2)  + lylm)  = Iy2[lx{  1 — "i2)  + mxlm)  , 

IX1Tly  ly  m x — 0 , 

Ix  1 y /y  / x 0 • 

Proof:  Differentiating  the  image  irradiance  equation,  I(x,  y ) = R(l,  m),  we  obtain 

Ix  — Rllx  *1*  Rm^x  t 
Iy  ”1"  RfflTTly 
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Now 

Ix2{my(  1 - l2)  + lylm ) = R2lx[lxm j,(l  — l2)  + lxlylm) 

l ) T lymxltTl) 

+2RiRmlx(mxmy(l  — l2)  + lymxlm)  , 

/y2(/x(l  — m2)  + mxlm)  = R2ly(lxly(l  — m2)  + lymxlm) 

+Rm2my(lxmy(l  — m2)  + inxmylm) 
+2RiRmly(lxmy(l  — m2)  + mxrnylrn)  . 

But,  for  a developable  surface  lxmy  = lymx,  (see  Appendix  B);  hence 
Ix2(my(l  - 12)  + lylm)  = Ri2lx(lymx(  1 - /2)  + lxlylm) 

+Rm2mx(mxmy(l  — l2)  + lxmylm ) 
+2RiRmlx(rnxmy{l  — l2)  + lxmylm ) , 

/y2(/x(l  — m2)  + mxlm)  = R2ly{lxly{  1 — m2)  + lxmylm) 

+Rm2my(lymx{l  — m2)  + mxmylm ) 
+2RiRmly(lymx{l  — m2)  + mxmylm)  . 

Therefore, 

I2{my{l  — l2)  + lylm)  ={R2lxly  + Rm2mxmy  + 2RiRmlxmy){mx(l  — l2)  + lxlm)  , 
/y2(/x(l  — m2)  + mxlm)  =(Ri2lxly  + Rm2mxmy  + 2RiRmlymx){ly{\  — m2)  + mylm)  . 
However,  the  surface  continuity  equation,  (see  Appendix  A),  is 
ly(l-m2)  + mylm  — mx(l-l2)  + lxlm  . 

We  have  the  required  result,  i.e.,  that  the  relationship  between  surface  orientation 
and  image  irradiance  for  a developable  surface  is 

Ix2{my{  1 — l2)  + lylm)  = Iy2{lx{  1 — m2)  + mjm)  . 

In  terms  of  p and  q,  the  equivalent  form  is 

lx  Qy  Iy  px  = 0 

In  terms  of  depth  z,  the  equivalent  form  is 

IX  Zyy  -fy  Zxx  = 0 

Note  that,  in  addition, 

lx  my  Iytnx  — — R i ( l x 7Tt  y — lymx)  , 

Ixly  lylx  ==  Rm(lymx  lxmy) 

Hence,  for  a developable  surface  lxmy  — lymx  = 0,  we  obtain  the  required  results 

IxTtly  IyTTlx  — 0 , 

Ixly  lylx  ==  0 
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ABSTRACT 

Texture  is  known  to  be  important  in  the  analysis  of 
radar  images  for  geologic  applications.  It  has  previously 
been  shown  that  texture  features  derived  from  the  grey-level 
co-occurrence  matrix  (GLCM)  can  be  used  to  separate  large 
scale  texture  in  radar  images.  Here  the  influence  of  sensor 
parameters,  specifically  the  spatial  and  radiometric 
resolution  and  flight  parameters,  i.e.,  the  orientation  of 
the  surface  structure  relative  to  the  sensor,  on  the  ability 
to  classify  texture  based  on  the  GLCM  features  is 
investigated.  It  was  found  that  changing  these  sensor  and 
f'light  parameters  greatly  affects  the  usefulness  of  the  GLCM 
for  classifying  texture  on  radar  images. 
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I.  Introduction 

Spectral,  textural  temporal  and  contextual  features  are 
four  important  pattern  elements  used  in  human  interpretation 
of  image  data  in  general  and  SAR  data  in  particular. 

Spectral  features  describe  the  average  band-to-band  tonal 
variations  in  a multi-band  image  set,  whereas  textural 
features  describe  the  spatial  distribution  of  tonal  values 
within  a band.  Contextual  features  contain  information 
about  the  relative  arrangement  of  image  segments  belonging 
to  different  categories,  and  temporal  features  describe 
changes  in  image  attributes  as  a function  of  time.  However, 
when  small  image  areas  within,  say,  a synthetic  aperture 
radar  (SAR)  image  are  independently  processed  on  a computer 
for  automated  analyses  only  the  tonal  and  textural  features 
are  usually  available  in  making  decisions. 

In  much  of  the  automated  procedures  for  processing 
radar  image  data  from  small  areas,  such  as  in 
crop-classification  studies,  only  the  average  tonal  values 
are  used  for  developing  a classification  algorithm. 

Textural  features  are  generally  ignored  on  the  basis  that 
the  poor  resolution  of  radar  imagery  does  not  provide 
meaningful  textural  information  for  such  applications  since 
the  areal  extent  of  the  target  is  usually  3mall.  However, 
there  are  many  other  applications  such  as  the  identification 
of  large  scale  geological  formations,  land-use  patterns, 
etc.,  where  the  resolution  is  more  than  adequate  to  provide 
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textural  information.  Indeed,  in  these  applications  texture 
is  probably  the  most  important  image  feature.  It  was 
previously  shown  [1]  that  texture  features  derived  from  the 
grey- level  co-occurrence  matrix  (GLCM)  can  be  used  to 
discriminate  texture  in  radar  images.  We  describe  in  this 
paper  the  influence  of  sensor  and  flight  parameters  on  our 
ability  to  quantitatively  classify  textures  in  radar  images 
using  the  GLCM.  The  effect  of  spatial  and  radiometric 
resolution  on  texture  classification  was  studied  in  one 
experiment.  It  was  found  that  the  classification  was  very 
sensitive  to  these  sensor  parameters,  only  the  image  with 
the  best  spatial  and  radiometric  resolutions  was 
quantitatively  useful.  Another  experiment  was  conducted  to 
determine  how  different  flight  paths,  i.e.,  looking  at  the 
same  terrain  from  different  angles  with  the  same  sensor  and 
incidence  angle,  changed  the  texture  classification. 

Optical  imaging  systems  rely  on  the  sun  to  illuminate  the 
scene  and  thus  the  sun  angle  becomes  a factor;  however, 
mission  profiles  for  these  sensors  are  usually  designed  to 
minimize  this  effect.  For  example,  the  LANDSAT  series  of 
sensors  uses  a high  sun  angle.  On  the  other  hand,  imaging 
radars  provide  their  own  illumination  and  it  is  not  clear 
what  effect  observing  the  same  geologic  structure  from 
different  angles  will  have  on  the  automated  analysis. 

In  the  following  section  the  texture  features  used  here 
to  separate  different  surface  structures  are  briefly 


described-  The  sensitivity  of  these  texture  features  to 
changes  in  radiometric  and  spatial  resolution  is  discussed 
next.  Radar  image  simulation  is  then  used  to  evaluate  the 
sensitivity  of  GLCM  texture  features  to  changes  in  the 
orientation  of  the  surface  structure  and  the  radar.  The 
results  of  the  two  studies  described  in  this  paper  indicate 
that  the  usefulness  of  textural  features  in  automated 
analysis  of  radar  images  is  sensitive  to  changes  in  the 
spatial  and  radiometric  resolution  of  the  system  as  well  as 
the  target/sensor  geometry. 

II.  The  Texture  Features 

The  textural  feature  extraction  algorithm  employed  here 
has  been  widely  used  [2-5]  for  analyzing  a variety  of 
photographic  images.  The  procedure  is  based  on  the 
assumption  that  the  texture  information  in  an  image  block 
'I'  is  contained  in  the  overall  or  'average'  spatial 
relationship  which  the  grey  tones  in  the  image  'I'  have  to 
one  another.  This  relationship  can  be  characterized  by  a 
set  of  grey-level  co-occurrence  (GLC)  matrices.  We  describe 
a procedure  for  computing  a set  of  GLC  matrices  for  a given 
image  block  and  define  a set  of  numerical  textural 
descriptors  (features)  that  can  be  extracted  from  the  GLC 
matrices.  These  textural  features  can  be  used  for  automated 
analysis  and  classification  of  blocks  of  radar  imagery. 

Image  texture  may  be  viewed  as  a global  pattern  arising  from 
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a deterministic  or  random  repetition  of  local  subpatterns  or 
primitives.  The  structure  resulting  from  this  repetition 
could  be  very  useful  for  discriminating  between  the  contents 
of  the  image  of  a complex  scene.  A number  of  approaches 
have  been  suggested  for  extracting  features  that  will 
discriminate  between  different  textures  [2-6].  Of  these 
approaches,  it  has  been  found  that  textural  features  derived 
from  grey-level  co-occurrence  matrices  (GLCM)  are  most 
useful  for  analyzing  the  contents  of  a variety  of  imagery  in 
remote  sensing,  biomedical  and  other  applications  [7-11]. 

The  GLCM  approach  to  texture  analysis  is  based  on  the 
conjecture  that  the  texture  information  in  an  image  is 
contained  in  the  overall  or  average  spatial  relationship 
between  the  grey  tones  of  the  image. 

The  second-order  grey-level  co-occurrence  matrix  of  an 

image  is  defined  as  follows.  Let  f(x,y)  be  a rectangular 

digital  picture  defined  over  the  domain  xe[0,n  ),  yc[0,n  ), 

x y 

x,yel.  Let  n^  be  the  number  of  grey  levels  in  f.  The 
unnormalized,  second-order  GLC  matrix  is  a square  matrix 
P of  dimension  n^.  The  (i,j)-th  entry  in  P,  denoted 
by  is  a function  of  the  image  tonal  values  and  a 

displacement  vector  d = (d^fd^).  The  entries  are 

unnormalized  counts  of  how  many  times  two  neighboring 
resolution  cells  which  are  spatially  separated  by  d occur 
on  the  image,  one  with  grey  tone  i and  the  other  with  grey 
tone  j.  That  is. 
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Pij  = (n^a  f n2^ ) 1 = i» 

f(m2,n2)  = j,  and  (m2,n2)  - (m^n.^  - d|  f (1) 

where  # denotes  the  number  of  elements  in  the  3et,  the 
indices  m^,  m2  and  n^,  n2  take  on  integer  values  in  the 
intervals  (0,n  ),  [0,n  ).  The  normalized  GLC  matrix  P 

JL  / 

with  entries  p^  is  obtained  from  P by  dividing  each  entry 
in  P by  the  total  number  of  paired  occurrences.  The 
definition  of  second-order  GLC  matrices  can  be  extended  to 
include  third-  and  higher-order  GLC  matrices.  While 
higher-order  GLC  matrices  may  be  important  in  some 
applications,  much  of  the  recent  work  in  texture  analysis 
has  been  based  on  second-order  GLC  matrices. 

The  second-order  GLC  matrices  are  computed  for  various 
values  of  the  displacement  vector  d,  and  features  derived 
from  the  GLC  matrices  are  used  for  classifying  the  contents 
of  an  image . 

Some  of  the  commonly  used  textural  features  derived 
from  the  GLC  matrix  are: 

1)  Uniformity  (sum  of  squares): 

£ Pij  <2*> 

ij 

2)  Contrast: 

E E (i_3)2 

i j 


(2b) 
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3)  Correlation: 


v £ q-yo-v  pti 


a a 
x y 


i 3 

4)  Entropy: 

E E Pii  lof  Pij 


i 3 

5)  Inverse  Difference  Moment: 

E E (Pii)p/ll-3'” 


i 3 
i/3 


6)  Maximum  Probability: 
max  pi 


i/3 


13 


(2c) 


(2d) 


(2e) 


(2f ) 


For  a variety  of  imagery  (aerial,  micrographic  and  x-ray) 
the  relationship  between  these  textural  features,  their 
values  and  what  they  represent  in  terms  of  visual  perception 
of  texture  are  reasonably  well  understood.  Using  features 
of  the  form  given  above,  Haralick  and  Shanmugan  [5-7]  were 
able  to  classify  a variety  of  images  with  over  85% 
classification  accuracy.  These  features  have  also  been  used 
to  separate  texture  in  radar  images  [1]. 

III.  Texture  Analysis  Of  SAR  Images  With  Different 
Spatial  And  Radiometric  Resolution 

Numerical  descriptions  of  texture  (specifically  those 
derived  from  the  grey-level  co-occurrence  matrix  (GLCM)  as 
in  Section  II)  have  been  shown  to  separate  some  simple 
geological  features  [1].  To  efficiently  design  a spaceborne 
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SAR  for  geologic  exploration  it  is  of  interest  to  determine 
how  the  ability  to  separate  geological  features  using  the 
GLCM  desired  features  varies  with  important  system 
parameters,  e.g.,  spatial  and  radiometric  resolution. 

A limited  set  of  radar  images  with  different  spatial 
and  radiometric  resolutions  were  obtained  (primarily  from 
the  Jet  Propulsion  Laboratory  [12]).  These  images  were 
generated  by  appropriate  processing  of  the  Seasat-A  SAR 
video  signal,  and  were  of  a geologically  interesting  area  in 
Tennessee  (Figure  1).  The  specific  areas  that  were  studied 
are  outlined  in  white.  The  combinations  of  spatial  and 
radiometric  resolution  contained  in  this  data  set  were  (25 
m,  4 looks),  (50  m,  4 looks),  (100  m,  4 looks),  (50  m,  2 
looks),  and  (50  m,  1 look).  Within  the  Tennessee  test  area, 
five  distinct  textures  were  identified  (see  Table  1 for  a 
description  of  the  geology  and  topography)  and  five  to  seven 
samples  of  each  texture  obtained  (see  Figure  1).  A sample 
of  a texture  is  an  image  (in  this  case  3.4  km  x 3.4  km  in 
size)  containing  only  one  texture  type.  Thus  for  each  3et 
of  sensor  parameters  30  texture  samples  were  obtained,  a 
total  of  150  texture  samples  (images)  were  used  in  this 
study.  For  each  texture  sample  a GLCM  was  calculated  and 
texture  features  found.  Specifically,  uniformity,  contrast, 
correlation,  entropy,  inverse  difference  moment,  and  maximum 
probability  were  the  texture  features  used  here.  Following 
[1]  the  GLCM  were  calculated  for  distances  of  1,  2 and  4 at 
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angles  of  0°,  45°,  90°,  and  135°.  The  above  texture 
features  were  calculated  for  each  distance  and  angle.  In 
,addition,  the  average  over  all  angles  for  each  texture 
feature  was  calculated.  Thus  each  texture  sample  is 
described  by  a set  of  30  numbers  (6  texture  features,  4 
angles,  and  the  average  for  each  feature). 

Scatter  diagrams  of  the  numerical  values  for  one  pair 
of  texture  features  are  shown  in  Figure  2.  These  plots  are 
for  distance  4 and  result  from  averaging  all  four  GLCM 
angles.  All  five  textures  can  be  separated  using  the 
correlation  and  maximum  probability  (Figure  2)  features  only 
for  the  system  with  a 25  m spatial  resolution  and  with  four 
independent  samples  averaged.  As  either  the  radiometric 
resolution  is  degraded  (decreased  number  of  independent 
samples  or  looks  averaged)  or  the  spatial  resolution  is 
degraded  the  ability  to  separate  these  textures  is  also 
degraded.  This  same  result  was  found  for  other  combinations 
of  texture  features  [13].  In  all  cases  only  the  images  with 
25  m,  4 looks  could  be  quantitatively  used  to  separate  these 
textures  using  the  GLCM. 

This  experiment  reinforces  the  conclusions  of  our 
previous  work  [1]:  automatically  derived  texture  features 

can  be  used  to  discriminate  texture  in  radar  images  of  rough 
terrain.  Additionally,  this  study  shows  that  the  ability  to 
use  the  GLCM  to  classify  texture  is  strongly  dependent  upon 
both  the  sensor's  spatial  and  radiometric  resolution.  Even 


though  the  data  set  used  for  this  study  was  very  limited 
these  results  do  indicate  that  the  usefulness  of  textural 
features  for  radar  image  analysis  is  sensitive  to  the 
spatial  and  radiometric  resolutions  of  the  sensor.  This 
should  be  expected  because  it  is  well  known  that  for  manual 
analysis  the  interpretability  of  radar  images  is  sensitive 
to  the  radiometric  and  spatial  resolutions  [14-17].  Thus, 
this  study  demonstrated  that  this  sensitivity  also  exists 
for  automatic  analysis. 

IV.  A Study  Of  The  Effect  Of  Look  Direction 
On  Texture  In  SAR  Images 

For  an  automatic  texture  analysis  system  for  radar  to 
be  successful,  a set  of  texture  features  must  be  found  which 
are  invariant  to  the  flight  path  of  the  sensor.  This 
invariance  is  clearly  needed  because  the  orientation  of  the 
terrain  features  relative  to  the  sensor's  flight  path  is  not 
known  a priori.  For  the  geologic  analysis  of  radar  imagery 
where  terrain  elevation  plays  a dominant  role  the  imaging 
geometry  of  radar  would  seem  to  be  a dominant  factor.  Also 
the  question  of  invariance  is  important  in  the  search  for 
'optimum'  sensor  configurations.  For  example,  it  might  be 
possible  to  classify  certain  terrain  features  at  one  sensor 
orientation  but  not  at  another.  However,  because  the 
orientation  of  the  sensor  to  the  terrain  features  of 
interest  will  never  be  known  a priori  an  optimum  sensor 
configuration  might  not  exist. 
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The  purpose  of  this  section  is  to  describe  the  results 
of  an  experiment  which  was  aimed  at  determining  the 
sensitivity  of  GLCM  texture  features,  shown  to  be  valuable 
as  a discriminate,  to  the  sensor  flight  direction,  i.e.,  the 
target/sensor  orientation.  It  was  found  (given  the 
limitations  of  the  experiment)  that  the  texture  features 
considered  here  could  be  classified  for  one  or  two 
target/sensor  orientations  but  not  for  all  the  three 
orientations  considered  here. 

To  isolate  the  effect  of  sensor  look  direction  it  was 
necessary  to  use  radar  simulation  [18]  to  create  a set  of 
images  with  controlled  terrain  and  sensor  parameters. 

Further  it  was  possible  using  the  simulation  approach  to 
remove  (i.e.,  not  include)  the  effect  of  speckle  [18]. 
Therefore,  this  study  focused  on  how  shadow,  layover,  and 
range  compression  changed  the  image  manifestations  of 
complex  terrain  structure  as  the  look  direction  of  the 
sensor  was  varied. 

In  radar  image  simulation  (for  a complete  description 
see  [18]),  the  terrain  to  be  analyzed  is  represented  as  a 
two-dimensional  integer  array  referred  to  as  a data  base. 

This  array  is  stored  on  a file  containing  fixed-length 
records.  These  correspond  directly  to  rows  in  the  array 
which  contain  a fixed  number  of  words  (columns).  This 
relationship  is  shown  in  Figure  3. 

The  three  data  bases  used  in  this  study  were  generated 
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from  data  received  from  the  U.S.  Geological  Survey  in  the 
form  of  three  digital  elevation  models.  These  were  received 
containing  elevation  values  which  correspond  directly  to  a 
1:24000  (1  inch  = 2000  feet)  topographical  map  sampled  at  30 
meter  intervals  in  both  the  x and  y directions.  Let  x 
define  columns  in  our  data  base  and  y to  refer  to  rows  (see 
Figure  3).  In  these  data  x and  y both  represent  30  meters 
on  the  ground.  Thus  each  elevation  value  was  considered  to 
be  valid  for  an  area  of  30x30  square  meters. 

The  third  dimension  of  the  data  base,  h,  represents  the 
elevation  of  each  cell  above  a given  reference  elevation. 
Each  increment  in  elevation  corresponds  to  Ah,  which 
describes  a scaling  factor  for  determining  the  quantization 
of  the  actual  elevation.  In  the  digital  elevation  models 
used  in  this  study,  the  value  for  Ah  was  equal  to  one  meter. 
This  led  to  a convenient  one-to-one  relationship  for  the 
elevations. 

The  relationship  among  the  values  for  Ax,  Ay  and  Ah 
describes  the  degree  to  which  the  elevation  changes  over  an 
area  on  the  ground.  Since  only  the  relative  structures  of 
the  terrain  are  of  interest  in  this  study,  this  relationship 
may  be  altered  as  needed.  After  first  removing  the 
reference  elevation  constant  the  data  was  scaled  by  0.25. 
This  allowed  the  Ax  and  Ay  values  to  represent  7.5  meters, 
while  the  value  for  Ah  remained  equal  to  1 meter. 

The  simulation  of  synthetic  aperture  radar  imagery  is 
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made  possible  through  the  application  of  a computer  program 
developed  at  the  University  of  Kansas  Remote  Sensing 
Laboratory  [18].  This  algorithm  will  simulate  the  effects 
of  a spaceborne  SAR  with  a look  direction  parallel  to  the 
rows  of  the  data  base  array.  Since  the  simulation  program 
always  processes  the  data  row-by-row,  the  only  way  to 
achieve  a different  look  direction  is  to  modify,  i.e., 
rotate,  the  data  base.  Keeping  this  in  mind,  the  unmodified 
data  base  is  defined  to  be  at  a look  direction  angle  of  0°. 
For  this  study  simulated  radar  imagery  was  to  be  generated 
for  the  same  areas  with  look  directions  of  0°,  45°,  and  90°. 
This  required  that  the  data  bases  be  correctly  oriented 
before  the  simulation  was  performed.  For  this,  computer 
programs  were  applied  to  rotate  the  original  data  in  order 
to  simulate  different  look  angles.  Nine  data  bases  were 
thus  available  for  simulation  (3  terrain  models  at  3 look 
directions).  These  nine  data  bases  were  then  processed 
using  the  simulation  program.  The  radar  parameters  used  for 
the  simulation  were  similar  to  those  of  the  Seasat-A  SAR. 

The  altitude  of  the  sensor  was  considered  to  be  roughly  800 
kilometers,  and  the  angle  of  incidence  between  the  sensor 
and  the  first  cell  of  the  data  base  was  given  to  be  20 
degrees.  For  the  purposes  of  this  study,  it  was  assumed 
that  all  of  the  terrain  data  was  of  one  scattering  category. 
The  scattering  coefficient  as  a function  of  incidence  angle 
is  shown  in  Figure  4. 
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Using  these  parameters  along  with  the  assumed  value  of 
7.5  meters  for  both  along-track  and  across-track 
resolutions,  radar  images  were  simulated,  producing  the 
desired  set  of  controlled  images.  However,  these  images  are 
now  rotated  relative  to  each  other.  To  eliminate  the 
rotational  dependence  of  the  GLCM  the  simulations  were 
converted  to  one  coordinate  system. 

Visually  the  effect  of  changing  the  flight  path  is 
dramatic.  Figure  5a-c  contains  the  simulated  radar  images 
for  one  of  the  digital  terrain  models.  In  Figure  5a  the 
sensor’s  look  direction  is  from  right  to  left.  This  is  our 
reference  direction  and  is  referred  to  as  the  0°  look  angle. 
The  simulation  of  a 45°  look  angle  (i.e.,  from  the  upper 
left  to  the  lower  right)  is  shown  in  Figure  5b  and  the  90° 
simulation  (i.e.,  from  top  to  bottom)  is  shown  in  Figure  5c. 
Similarly,  Figures  6a-c  and  7a-c  contain  the  image 
simulations  for  two  other  digital  terrain  models.  Close 
analysis  of  these  images  reveals  many  features  which  are 
totally  obscured  by  shadow  at  one  look  angle  but  not  at  the 
others  as  was  shown  in  [17].  Also,  the  spatial  structure 
changes  as  the  look  angle  is  varied  from  0°  to  45°  to  90°. 

Beginning  with  the  0°  look  direction  three  distinct 
spatial  structures,  textures,  were  identified. 

TEXTURE  1 contained  low  relief  with  some  small  hills  and 
ridges,  maximum  relief  is  about  100-300  feet. 
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TEXTURE  2 contained  elongated  ridges  and  mountains  usually 
separated  by  steep  gradient  streams,  maximum 
relief  is  about  500-700  feet. 

TEXTURE  3 contained  long,  narrow  valleys  with  steep  slopes 
and  depths  of  about  300-400  feet.  Valley  streams 
have  medium  to  low  gradients. 

From  each  texture,  3 or  4 samples  (subimages)  were  obtained. 
The  same  subimages  were  then  sampled  from  the  45°  and  90° 
look  direction  simulations.  A total  of  33  subimages 
provided  the  input  for  this  experiment  (11  subimages  for 
each  look  direction).  These  subimages  are  shown  in  Figure 
8a-c.  The  specific  research  questions  addressed  by  thi3 
experiment  were  (1)  can  these  three  textures  be  classified 
using  GLCM  features  at  any  of  the  three  look  directions,  and 
(2)  can  these  three  textures  be  classified  using  GLCM 
features  independent  of  the  look  direction,  i.e.,  are  the 
texture  features  derived  from  the  same  spatial  structure 
independent  of  the  look  direction  of  the  sensor,  thus,  can 
the  textures  be  classified  using  all  three  orientations 
s imultaneously . 

For  each  of  the  33  subimages  described  above  a GLCM  and 
the  resulting  texture  features  were  calculated  for  distances 
of  4,  6,  and  10  at  0°,  45°,  90°,  and  135°  (these  angles  will 
be  referred  to  as  GLC  angles  as  opposed  to  the  look 
direction  angle).  It  was  found  [13]  that  distances  4 and  10 
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showed  basically  the  same  trend  as  6 so  only  the  distance  6 
results  will  be  discussed.  Also,  it  was  found  that 
averaging  the  texture  features  over  the  GLC  angle  as  was 
done  previously  [1]  destroyed  our  ability  to  separate 
textures  one  and  two.  This  is  expected  from  their  spatial 
structure.  Thus  only  results  from  individual  GLC  angles 
will  be  presented.  The  GLCM  texture  features  were  analyzed 
pair-wise  as  was  also  done  previously  [1]. 

Analysis  of  the  data  qualitatively  showed  that  all 
three  textures  could  be  classified  at  one  or  two 
target/sensor  orientations  but  not  at  all  three 
simultaneously.  For  example.  Figure  9a-c  contains  the 
scattergrams  for  the  maximum  probability  and  contrast 
texture  features  at  GLC  distance  6 and  GLC  angle  of  0°.  At 
a look  direction  of  0°  (Figure  9a)  none  of  the  three 
textures  can  be  separated,  while  at  45°  (Figure  9b)  all 
three  textures  can  be  classified.  Analysis  of  other  texture 
pairs  shows  the  same  trend,  i.e.,  the  textures  considered 
here  can  be  classified  for  one  or  two  sensor  look  directions 
but  not  at  all  three  [13].  If  the  texture  samples  for  each 
terrain  structure  from  all  three  look  directions  are 
combined  it  becomes  obvious  that  the  textures  considered 
here  cannot  be  classified  independent  of  look  direction  (see 
Figure  lOa-e). 

The  purpose  of  this  analysis  was  to  determine  the 
sensitivity  that  GLCM  texture  features  show  to  changes  in 
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the  orientation  of  the  surface  structure  relative  to  the 
sensor.  Radar  image  simulation  was  used  to  generate  a 
suitable  set  of  images  with  the  effects  of  the  sensor  flight 
path  isolated.  Within  the  limitations  of  this  experiment, 
i.e.,  three  different  terrain  structures,  and  three  flight 
directions,  it  was  shown  that  (1)  the  GLCM  texture  features 
can  be  used  to  classify  the  terrain  structures  at  one  or  two 
flight  directions  but  not  at  all  three,  and  (2)  the  GLCM 
texture  features  cannot  be  used  to  classify  these  terrain 
structures  independent  of  the  flight  path.  The  search  of 
the  optimum  set  of  sensor  parameters  for  geologic 
applications  is  thus  complicated.  That  is,  the  results  of 
this  study  indicate  that  the  optimum  sensor  for  classifying 
(using  either  manual  or  automatic  techniques)  surface 
structure  is  dependent  upon  the  orientation  of  the  structure 
to  the  flight  path  of  the  sensor.  Because  of  the 
monostatistic  nature  of  radar  imaging  the  same  surface 
structure  imaged  at  two  different  flight  angles  can  (and 
often  do)  appear  totally  dissimilar.  A set  of  sensor 
parameters  optimized  to  detect  these  structures  at  one 
flight  angle  might  be  totally  different  if  the  flight  angle 
were  changed. 

V.  Conclusions  And  Recommendations 
Texture  is  an  important  characteristic  of  radar  images 
of  rough  terrain.  It  was  shown  that  the  GLCM  derived 


texture  features  can  be  used  to  classify  texture.  In  this 
paper  we  have  demonstrated  that  GLCM  derived  texture 
features  are  sensitive  to  both  sensor  and  flight  parameters. 
In  fact,  we  lose  our  ability  to  classify  texture  by  these 
features  if  either  the  radiometric  or  spatial  resolution  is 
degraded.  We  also  found  that  these  texture  features  are 
sensitive  to  the  sensor  flight  path.  We  could  classify  the 
surface  structure  for  one  or  two  target/sensor  orientations 
but  not  for  all  three  considered  simultaneously.  That  is, 
GLCM  texture  features  cannot  be  used  to  classify  texture 
independent  of  the  flight  path. 

While  general  conclusions  on  the  sensitivity  of 
textural  features  to  system  and  flight  parameters  can  be 
made  from  the  results  of  this  study,  there  is  a need  to 
further  refine  these  conclusions,  specifically  it  is 
recommended  that  the  sensitivity  shown  here  be 
quantitatively  studied.  Quantitative  results  are  needed  to 
help  guide  system  design  and  flight  planning.  Two 
approaches  to  obtaining  quantitative  results  should  be 
pursued  in  parallel.  First,  an  analytic  study  of  the 
relationships  among  surface,  sensor  and  flight  parameters 
and  the  GLCM  is  needed.  Second,  more  radar  images  should  be 
analyzed.  With  more  data  the  qualitative  discussion  of  the 
effects  of  spatial  and  radiometric  resolution  can  be 
extended  to  a quantitative  analysis,  for  example  plots  of 
the  'variauice'  of  each  cluster  as  a function  of  resolution 


596 


could  then  be  studied.  The  ultimate  goal  of  such  an 
analysis  would  be  an  expression  for  the  sensitivity  of  each 
texture  feature  as  a function  of  resolution.  This  study 
also  dealt  with  only  radical  changes  in  the  flight  direction 
over  a fixed  site.  Further  analysis  is  now  needed  to 
determine  the  effect  of  small  angle  changes,  e.g.,  on  the 
order  of  5°.  Also  this  study  only  considered  one  angle  of 
incidence.  It  would  be  interesting  to  determine  if  there 
exist  some  incidence  angle  for  which  we  could  classify 
surface  structure  independent  of  the  flight  angle. 
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Figure  2.  Scatter  Diagrams  for  Texture  Feature 
Pairs  as  a Function  of  Spatial 
and  Radiometric  Resolution 
(Average  Over  All  GLC  Angles). 
Maximum  Probability/Correlation. 
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Figure  3.  Data  Base  Geometry  for  Radar 
Image  Simulation. 
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Radar  Image  Simulations  for  Three 
Flight  Paths  for  Site  1. 
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Figure  6.  Radar  Image  Simulations  for  Three 

Flight  Paths  for  Site  2. 


Figure  8.  Texture  Samples  from 

Radar  Image  Simulations. 
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Figure  9.  Scatter  Diagrams  for  Texture  Feature 
Pairs,  Contrast/Maximum  Probability, 
at  a 0°  GLCM  Angle  as  a 
Function  of  Flight  Path. 
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(d)  Correlation/Maximum  Probability 
at  a 0°  GLCM  Angle 
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Figure  10.  Scatter  Diagrams  for  Texture 
Feature  Pairs  Combining 
All  Flight  Paths. 
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■ 

GEOLOGY 

TOPOGRAPHY 

T5 

Rocks  of  the  Lower 
Pennsylvanian  Consisting  of 
Alternating  Beds  of  Sand- 
stone and  Shale  with  a Few 
Beds  of  Coal . 

Mountains  and  Ridges  with  Steep 
Slopes  and  a Maximum  Relief  of 
of  About  1500  Feet. 

1 

Rocks  of  the  Lower  to 
Middle  Ordovician  Consis- 
ting Primarily  of  Dolomite 
and  Cherty  Dolomite  with 
Some  Beds  of  Limestone, 
Shale,  and  Sandstone. 

Rolling  Hills  and  Several  Ridges 
with  a Maximum  Relief  of 
200-300  Feet. 

1 

See  T4 

Area  of  Overall  Low  Relief  but 
with  Many  Small  Hills  that  are 
Separated  by  Several  Creeks  and 
Streams. 

T3 

Central  Region  (See  T/,) 
Flanked  on  Either  Side  by 
Rocks  of  the  Upper  Part 
of  the  Middle  Cambrian  in 
Beds  of  Dolomite,  Lime- 
stone, and  Slate. 

Rolling  Hills  and  Elongated 
Ridges  Separated  by  a Trellis 
Drainage  Pattern  and  Having  a 
Maximum  Relief  of  About  500 
Feet . 

T2 

Rocks  of  the  Upper  Pre- 
Cambrian  Consisting 
Primarily  of  Metasediments. 

Mountains  and  Hills  with  Steep 
Slopes  and  a Maximum  Relief  of 
About  1000  Feet. 

Table  1.  Geology  and  Topography  of  the 
Tennessee  Test  Area 
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Keynote  Address  - Robert  M.  Haralick,  Professor  of 
Electrical  Engineering,  Virginia  Polytechnic 
Institute  and  State  University,  Blacksburg,  Virginia 

"Computer  Vision  For  Remotely  Sensed  Data" 

Math/Stat:  First  Session 


10:30 

- 11:15 

R.  P.  Heydorn  and  Rehka  Basu,  NASA/ JSC 
"Estimating  Location  Parameters  in  a Mixture  Model" 

11:15 

- 12:00 

David  Scott,  Rice  University 

"Multivariable  Density  Estimation  and  Remote  Sensing" 

12:00 

- 1:30 

Lunch 

Math/Stat:  Second  Session 

1:30 

- 2:15 

Manouher  Naraghi,  Jet  Propulsion  Lab 

"Random  Field  Models  for  Use  in  Scene  Segmentation" 

2:15 

- 3:00 

Charles  Peters  and  H.  P.  Decell,  Jr.,  University  of 
Houston 

"Mixture  Models  for  Dependent  Observations" 

3:00 

- 3:30 

Break 

3:30 

- 4:15 

Carl  Morris  and  Hubert  Kostal , University  of  Texas 
at  Austin 

"An  Empirical  Bayes  Approach  to  Spatial  Estimation" 

4:15 

- 5:00 

L.  F.  Guseman,  Jr.,  and  L.  Schumaker,  TAMU 
"Spline  Classification  Methods" 

6:00 

- 10:00 

Social  Hour  and  Banquet,  Gil  ruth  Center 

Thursday,  June  2: 

8:30 

- 8:45 

Announcements 

Math/Stat:  Third  Session 

8:45 

- 10:00 

Emanuel  Parzen,  TAMU 

"Quantile  Data  Analysis  of  Image  Data" 
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10:00  - 10:15  Break 

10:15  - 10:45  W.  B.  Smith  and  Eugene  Shine,  TAMU 

"Discrimination  Relative  to  Measures  of  Non-Normality" 


10:45  - 11:15 

11:15  - 12:00 

12:00  - 1:30 

1:30  - 2:15 

2:15  - 3:00 

3:00  - 3:30 

3:30  - 4:15 

4:15  - 5:00 

Friday,  June  3: 
8:30  - 8:45 

8:45  - 9:30 

9:30  - 10:15 

10:15  - 10:30 

10:30  - 11:15 

- 12:00 


H.  J.  Newton,  TAMU 

"Repeated  Measures  Analysis  of  Image  Data" 

Tae  H.  Joo  and  Daniel  N.  Held,  Jet  Propulsion  Lab 
"SAR  Speckle  Noise  Reduction  Using  Wiener  Filter" 

Lunch 

Pattern  Recognition:  First  Session 


Larry  S.  Davis,  Fu-pei  Hu,  Les  Kitchen  and  Vincent 
Hwang,  University  of  Maryland 
"Image  Matching  Using  Hough  Transforms" 

Laveen  N.  Kanal , LNK  Corporation 

"Subpixel  Registration  Accuracy  and  Modelling" 

Break 

E.  M.  Mikhail  and  F.  C.  Paderes,  Jr.,  Purdue 
University 

"Simulation  Aspects  for  the  Study  of  Rectification  of 
Satellite  Scanner  Data" 

David  Dow,  National  Space  Technology  Labs 
"Progress  in  the  Scene-to-Map  Registration 
Investigation" 


Announcements 

Pattern  Recognition:  Second  Session 


Alan  Strahler,  Waldo  Tobler  and  Curtis  Woodcock, 
Hunter  College 

"Relating  Spatial  Patterns  in  Image  Data  to  Scene 
Characteristics" 

Grahame  Smith,  SRI  International 
"Shape  from  Shading:  An  Assessment" 

Break 

K.  S.  Shanmugan,  University  of  Kansas 

"Power  Spectral  Density  of  Markov  Texture  Fields" 

Discussion  Session 


11:15 


LIST  OF  ATTENDEES 


Paul  E.  Anuta,  Purdue  University 

Kenneth  Baker,  NASA/JSC 
T.  C.  Baker,  Lockheed  EMSCO 
Rehka  Basu,  NASA/JSC 
Gary  Breaux,  Texas  A&M  University 

Kristine  Butera,  National  Space  Technology  Laboratories 

Don  H.  Card,  NASA/ Ames  Research  Center 
R.  B.  Cate,  Lockheed  EMSCO 
Raj  S.  Chhikara,  Lockheed  EMSCO 
Carolyn  A.  Clark,  Lockheed  EMSCO 

L.  S.  Davis,  University  of  Maryland 

H.  P.  Decell,  Jr.,  University  of  Houston 

David  D.  Dow,  National  Space  Technology  Laboratories 

Jon  Erickson,  NASA/JSC 

A1  Feiveson,  NASA/ JSC 
Mary  C.  Ferguson,  NASA/JSC 

L.  F.  Guseman,  Jr.,  Texas  A&M  University 

Forrest  Hall,  NASA/JSC 

Robert  Haralick,  Virginia  Polytechnic  Inst.  & State  University 
R.  P.  Heydorn,  NASA/JSC 
Howard  Hogg,  NASA  Headquarters 
Glen  Houston,  NASA/ JSC 

Tae  H.  Joo,  Jet  Propulsion  Laboratory 
David  L.  B.  Jupp,  Hunter  College 

Laveen  N.  Kanal , LNK  Corporation 

Hubert  Kostal , University  of  Texas  at  Austin 

Phyllis  Krauss,  Texas  A&M  University 

Richard  Latty,  NASA  Goddard/University  of  Maryland 
Kent  Lennington,  Lockheed  EMSCO 
Gerard  Livingston,  Lockheed  EMSCO 
James  Lundgren,  Lockheed  EMSCO 
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R.  B.  MacDonald,  NASA/JSC 

Jane  Mai  in,  Lockheed  EMSCO 

Anne  Marie  McAndrew,  NASA/JSC 

Naresh  Mehta,  Lockheed  EMSCO 

Michael  Merickel,  Lockheed  EMSCO 

E.  M.  Mikhail,  Purdue  University 

Robert  Mohler,  Lockheed  EMSCO 

Carl  Morris,  University  of  Texas  at  Austin 

Manouher  Naraghi,  Jet  Propulsion  Laboratory 
H.  J.  Newton,  Texas  A&M  University 

Pat  Odell,  University  of  Texas  at  Dallas 

Fidel  C.  Paderes,  Jr.,  Purdue  University 
Emanuel  Parzen,  Texas  A&M  University 
B.  C.  Peters,  Jr.,  University  of  Houston 
Brooks  Pollock,  Lockheed  EMSCO 

D.  B.  Ramey,  Lockheed  EMSCO 
William  E.  Rice,  NAS A/ JSC 

Donna  Scholtz,  EROS  Data  Center 

David  Scott,  Rice  University 

K.  S.  Shanmugan,  University  of  Kansas 

Sylvia  Shen,  Lockheed  EMSCO 

Grahame  Smith,  SRI  International 

H.  G.  Smith,  Lockheed  EMSCO 

W.  B.  Smith,  Texas  A&M  University 

Charles  Sorensen,  Lockheed  EMSCO 

Mickey  Steib,  NASA/ JSC 

Alan  H.  Strahler,  Hunter  College 

Shelby  Til  ford,  NASA  Headquarters 

Waldo  R.  Tobler,  University  of  California  at  Santa  Barbara 
M.  H.  Trenchard,  Lockheed  EMSCO 
M.  C.  Trichel,  NASA/JSC 

Steven  Wharton,  NASA/GSFC 
Curtis  Woodcock,  Hunter  College 
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